LaScaMolMR

Overview

LaScaMolMR.jl (Large Scale Molecular Mendelian Randomization) is a threaded Mendelian Randomization (MR) package that is focused on the generation of transcriptome wide / molecular MR analysies. Although it provides an interface for most common MR regression estimators (Inverse Variance Weighted, Weighted Median, Egger, Wald), its intended use is to enable fast Omic-wide Mendelian Randomization studies. The rise of large genetic cohort data has benefited the statistical power of Genome Wide Association Studies (GWAS) and Quantitative Trait Loci (QTL). Thus enabling findings in extensive studies such as Transcriptome Wide MR (TWMR). LaScaMolMR.jl provides a fast and efficient framework to such analyses, allowing users to customize the parameters of the study.

Tutorial

Example 1 : Cis-MR

For a QTL dataset composed as follows :

base/folder
├── exposureA
│   ├── exposureA_chr1.txt
│   └── exposureA_chr2.txt
├── exposureB
│   ├── exposureB_chr1.txt
│   └── exposureB_chr2.txt
└── exposureC
    ├── exposureC_chr3.txt
    └── exposureC_chr4.txt

With example data composed like this (tab separated) :

chr	pos	effect_allele	other_allele	beta	se	some_other_column pval
1	10511	A	G	-0.176656	0.136297	.   0.194939
1	10642	A	G	-0.724554	0.345390	..  0.035924
1	11008	G	C	-0.017786	0.016673	... 0.286088
1	11012	G	C	-0.017786	0.016673	..  0.286088
1	13110	A	G	0.013272	0.021949	.   0.545411
1	13116	G	T	-0.027802	0.013111	..  0.0339672
1	13118	G	A	-0.027802	0.013111	... 0.0339672
1	13259	A	G	-0.122207	0.210776	..  0.562052
1	13273	C	G	0.007077	0.015337	.   0.644463

and a GWAS of outcome composed similarly but comma separated, the following code will generate a cis-MR study with default parameters :

using LaScaMolMR
  1. Describe the exposure data structure (files and columns).
path_pattern = ["exposure", TRAIT_NAME, "/exposure", TRAIT_NAME, "_chr", CHR, ".txt"]
columns = Dict(1 => CHR, 2 => POS, 3 => A_EFFECT, 4 => A_OTHER, 5 => BETA, 6 => SE, 8 => PVAL)

trait_v = ["A", "B", "C"] # exposure trait identifiers
chr_v = [1, 2, 3] # chromosome corresponding to exposure protein or gene
tss_v = [45287, 984276, 485327765] # position of TSS, if relevant

exposure::QTLStudy = QTLStudy_from_pattern("base/folder/", 
                                            path_pattern, 
                                            trait_v, chr_v, tss_v, 
                                            columns, separator = '\t')
  1. Describe the outcome data.
outcome = GWAS("/some/file", columns, separator = ',', trait_name = "Some Painful Disease")
  1. Perform Medelian Randomization study by providing input formats and reference genotype data files.
# Plink 1.9 files base names for each chromosome (You can also use a single file)
plink_files = ["folder/basename_file_chr$(i)" for i in 1:22]

# Perform MR for every exposure - outcome pairs with default parameters
out_table = mrStudy(exposure, outcome, "cis", plink_files)

# with MiLoP approach to internal pleiotropy and other parameters :
out_table_MiLoP = mrStudy(exposure, outcome, "cis", plink_files, 
                        approach = "MiLoP", 
                        r2_tresh = 0.01, 
                        p_tresh = 5e-8,
                        p_tresh_MiLoP = 5e-4,
                        pval_bigfolat = true)

Example 2 : Trans-MR

For a QTL dataset composed as follows :

base/folder
├── exposureA.txt
├── exposureB.txt
└── exposureC.txt

With example data composed like this (tab separated) :

chr	pos	A1	A2	beta	se	some_useless_column pval
1	10511	A	G	-0.176656	0.136297	.   0.194939
1	10642	A	G	-0.724554	0.345390	..  0.035924
1	11008	G	C	-0.017786	0.016673	... 0.286088
1	11012	G	C	-0.017786	0.016673	..  0.286088
1	13110	A	G	0.013272	0.021949	.   0.545411
1	13116	G	T	-0.027802	0.013111	..  0.0339672
1	13118	G	A	-0.027802	0.013111	... 0.0339672
1	13259	A	G	-0.122207	0.210776	..  0.562052
1	13273	C	G	0.007077	0.015337	.   0.644463

and a GWAS of outcome composed similarly but comma separated, the following code will generate a trans-MR study with default parameters :

using LaScaMolMR
  1. Describe exposure data.
path_pattern = ["exposure", TRAIT_NAME, ".txt"]
columns = Dict(1 => CHR, 2 => POS, 3 => A_EFFECT, 4 => A_OTHER, 5 => BETA, 6 => SE, 8 => PVAL)

trait_v = ["A", "B", "C"] # Chromosome and TSS informations are not relevant in Trans setting.

exposure::QTLStudy = QTLStudy_from_pattern("base/folder/", 
                                            path_pattern, 
                                            trait_v, chr_v = nothing, 
                                            tss_v = nothing, 
                                            columns, separator = '\t', 
                                            only_corresp_chr = false)
  1. Describe outcome data.
outcome = GWAS("/some/file", columns, separator = ',', trait_name = "Some Painful Disease")
  1. Perform Medelian Randomization study by providing input formats and reference genotype data files.
# Plink 1.9 files base names for each chromosome (You can also use a single file)
plink_files = ["folder/basename_file_chr$(i)" for i in 1:22]

# Perform MR for every exposure - outcme pairs with default parameters
out_table = mrStudy(exposure, outcome, "trans", plink_files)

# with MiLoP approach and other parameters :
out_table2 = mrStudy(exposure, outcome, "trans", plink_files, 
                        approach = "MiLoP", 
                        r2_tresh = 0.01, 
                        p_tresh = 5e-8,
                        filter_beta_ratio = 1)
# Default p_tresh_MiLoP value is the same as p_tresh

MiLoP approach

Mitigated Local Pleiotropy (MiLoP) approach modifies the potential IV selection process to remove instrument variables associated to more than 1 exposure at p_tresh_MiLoP significance level.

mr_* regression functions

LaScaMolMR provides four MR regression functions : mr_wald, mr_ivw, mr_egger and mr_wm performing respectively the wald ratio of the first provided instrument, Inverse Variance Weighted, Egger and Weighted Median regressions.

Example :

out = mr_wm(beta_outcome, se_outcome, beta_exposure, se_exposure;
            iteration = 10000, seed = 42, α = 0.05)

mr_output as a standard output for mr_* :

The mroutput struct provies a standard interface to functions performing mendelian randomization. This allows for user to use its own MR functions for mrStudy. Any function receiving 4 vectors, having at least $\alpha$ as an option and outputing a `mroutput` is valid.

Here is the definitioin of mr_output :

struct mr_output
    nivs::Int
    effect::Float64
    se_effect::Float64
    ci_low::Float64
    ci_high::Float64
    p::Float64
    intercept::Float64
    p_intercept::Float64
    ci_low_intercept::Float64
    ci_high_intercept::Float64
    heter_stat::Float64
    heter_p::Float64
end

Any function following this format could be provided to ClumpAndMR/mrStudy in the mr_methods option, including user-defined functions.

function mr_something(beta_outcome, se_outcome, beta_exposure, se_exposure; 
                      α = 0.05, other_params = default_value
                     )::mr_output

    # Calculate values
    ...
    # end

    # Not calculated values can be replaced by NaN
    return mr_ouput(nivs, effect, ci_low, ci_high, p, NaN, NaN, NaN, NaN, heter_stat, heter_p)

end

output = NaiveCis(data, genotypes; mr_methods = [mr_ivw, mr_something])

API

Index

Provide inputs

These functions allow users to specify the input formats. The QTL summary statistics can be united in a single file or divided by exposure, chromosome or both.

For the sake of simplicity, we consider a genetic association study as a GWAS when a single phenotype was studied and as a QTLStudy otherwise.

LaScaMolMR.GenVarInfoType

Enum of different information present in qtl file path or inside QTL/GWAS text delimited files.

values :

TRAIT_NAME  # Name of the trait (e.g. QTL protein name)
CHR         # chromosome
POS         # position in chromosome
A_EFFECT    # effect allele of SNP
A_OTHER     # reference allele of SNP
BETA        # effect size of SNP
SE          # standard error of effect size
PVAL        # Pvalue for H₀ : BETA = 0
source
LaScaMolMR.GWASType

Type GWAS which contains informations about GWAS file

path        # path to file
columns     # Dictionary of column number => GenVarInfo type of info at column
separator   # column separator
trait_name  # name of the trait (nothing if not specified)

Example

gwas = GWAS("path/to/file", 
            Dict(1 => PVAL, 2 => CHR, 3 => POS, 8 => BETA, 9 => SE),
            ',')
source
LaScaMolMR.QTLStudyType

Type QTLStudy which contains informations about QTL file(s) format and implacement.

The method QTLStudy_from_pattern helps building information from patterns in file names and is the prefered methods to construct a QTLStudy.

fields :

path_v                  # vector of file names (path)
traits_for_each_path    # Trait corrpespondinf to each path (nothing if files are not trait specific)
trait_v                 # names of all traits included
chr_v                   # chromosomes corresponding to each trait (nothing if not relevant)
tss_v                   # tss position corresponding to each trait (nothing if not relevant)
columns                 # Dictionary of column number => GenVarInfo type of info at column
separator               # column separator
source
LaScaMolMR.QTLStudy_from_patternFunction

Make QTLStudy which contains informations about QTL file(s) format and implacement from some file pattern

arguments :

folder is the main folder cntaining all QTL files.
path_pattern is a vector of strngs and GenVarInfo characterizing the pattern of every file name in the folder.
trait_v is an collection of traits that should be incuded in future analysis.
chr_v is a vector of each chromosome of each QTL trait.
tss_v is a vector of every Transcription start site.
columns is a dictionary of all informations contained in columns of QTL files.
separator is the separator of QTL files.
only_corresp_chr indicates if only variants on chromosome correspoding to QTL trait should be kept. Default is true. Set to false for Trans MR studies.

TIP : if your file contains a chromosome:position column (e.g. 1:324765), consider setting your separator to [':', sep]

Examples

for a sigle file QTL :

QTLStudy_from_pattern("some/folder", ["qtl.txt"], tv, cv, tssv,
                      Dict(1 => CHR, 2 => POS, 8 => PVAL, ...),
                      separator = ' ',
                      only_corresp_chr = true)

for this arhitecture (in UNIX) :

test/data
├── exposureA
│   ├── exposureA_chr1.txt
│   └── exposureA_chr2.txt
├── exposureB
│   ├── exposureB_chr1.txt
│   └── exposureB_chr2.txt
└── exposureC
    ├── exposureC_chr3.txt
    └── exposureC_chr4.txt

we get this code :

path_pattern = ["exposure", TRAIT_NAME, "/exposure", TRAIT_NAME, "_chr", CHR, ".txt"]
trait_v = ["A", "B", "C"]
chr_v = [1, 2, 3]
tss_v = [45287, 984276, 485327765]

qtl = QTLStudy_from_pattern("test/data", path_pattern, ttrait_v, chr_v, tss_v,
                            Dict(1 => CHR, 2 => POS, 8 => PVAL, ...),
                            separator = ' ',
                            only_corresp_chr = true)
source

Perform a MR Study

The function mrStudy allows the users to perform meta-analysis over many pairs of traits given the input format. The function has three implementations depending on the type of studies used as exposure and outcomes. (GWAS -> GWAS, QTL -> GWAS, GWAS -> QTL)

LaScaMolMR.mrStudyFunction

Perform a Mendelian Randomization study with exposure QTL and outcome GWAS

arguments :

exposure::QTLStudy : exposure QTL data
outcome::GWAS : outcome gas data
type::AbstractString : specifies if the analysis is a trans or cis MR analysis (is either "trans" or "cis")
bedbimfam_dirnames::AbstractArray{<:AbstractString} : base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
approach::String: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")
p_tresh::Float64: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)
window::Integer: maximal distance between a potential Instrument Variable and transciption start site of gene exposure (default is 500000)
`r2
tresh::Float64`: maximial corrlation between to SNPs (default is 0.1)
exposure_filtered::Bool : If true, the exposure files are considered already filtered will not filtered on distance to tss and level of significance (default is false)
mr_methods::AbstractVector{Function} : Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw, mr_egger, mr_wm and mr_wald. default value is [mr_ivw, mr_egger]
α::Float64 : α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)
trsf_pval_exp::Union{Function, Nothing} : Transformation to apply to pvalues in exposure dataset
trsf_pval_out::Union{Function, Nothing} : t = Transormation to apply on pvalues in outcome dataset
low_ram::Bool : If true, if the exposure files each contain only one exposure trait, mrStudyNFolds with n_folds of 10 will be used.
write_ivs::AbstractString : write selected Instrumental variables to specified directory
write_filtered_exposure::AbstractString : write a filtered version of exposure files to specified file name. This file will be tab separated and will only contain columns necessary for further MR Studies.
filter_beta_raio::Real : Filter IVs for which the exposure effect is filter_beta_raio times outcome effect or greater. default is 0.
pval_bigfloat::Bool : use true if pvalues can be under 5e-324. (default is false)
min_maf::Real : minimal variant maf to be kept as a potential IV. (default is 0)
infos::Bool : If true, infos about advancement compute are printed to terminal (default is true)

Examples

results = mrStudy(qtl, gwas, "trans", genotypes, 10, approach = "naive", window = 250000, trsf_pval_exp = x -> exp10.(x))
source

Perform a Trans-Mendelian Randomization study with exposure GWAS and outcome GWAS

arguments :

exposure::GWAS : exposure QTL data
outcome::GWAS : outcome GWAS data
bedbimfam_dirnames::AbstractArray{<:AbstractString} : base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
approach::String: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")
p_tresh::Float64: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)
r2_tresh::Float64: maximial corrlation between to SNPs (default is 0.1)
exposure_filtered::Bool : If true, the exposure files are considered already filtered will not filtered on distance to tss and level of significance (default is false)
mr_methods::AbstractVector{Function} : Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw, mr_egger, mr_wm and mr_wald. default value is [mr_ivw, mr_egger]
α::Float64 : α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)
trsf_pval_exp::Union{Function, Nothing} : Transformation to apply to pvalues in exposure dataset
trsf_pval_out::Union{Function, Nothing} : t = Transormation to apply on pvalues in outcome dataset
low_ram::Bool : If true, if the exposure files each contain only one exposure trait, mrStudyNFolds with n_folds of 10 will be used. write_ivs::AbstractString : write selected Instrumental variables to specified directory
write_filtered_exposure::AbstractString : write a filtered version of exposure files to specified file name. This file will be tab separated and will only contain columns necessary for further MR Studies.
pval_bigfloat::Bool : use true if pvalues can be under 5e-324. (default is false)
filter_beta_raio::Real : Filter IVs for which the exposure effect is filter_beta_raio times outcome effect or greater. default is 0.
infos::Bool : If true, infos about advancement compute are printed to terminal (default is true)

Examples

results = mrStudy(gwas1, gwas2, genotypes, 10, approach = "naive", trsf_pval_exp = x -> exp10.(x))
source

Perform a Trans-Mendelian Randomization study with exposure GWAS and outcome QTL

arguments :

exposure::GWAS : exposure GWAS data
outcome::QTLStudy : outcome QTL data
bedbimfam_dirnames::AbstractArray{<:AbstractString} : base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :

approach::String: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")
p_tresh::Float64: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)
r2_tresh::Float64: maximial corrlation between to SNPs (default is 0.1)
mr_methods::AbstractVector{Function} : Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw, mr_egger, mr_wm and mr_wald. default value is [mr_ivw, mr_egger]
α::Float64 : α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)
trsf_pval_exp::Union{Function, Nothing} : Transformation to apply to pvalues in exposure dataset
trsf_pval_out::Union{Function, Nothing} : t = Transormation to apply on pvalues in outcome dataset
low_ram::Bool : If true, if the exposure files each contain only one exposure trait, mrStudyNFolds with n_folds of 10 will be used. write_ivs::AbstractString : write selected Instrumental variables to specified directory
pval_bigfloat::Bool : use true if pvalues can be under 5e-324. (default is false)
filter_beta_raio::Real : Filter IVs for which the exposure effect is filter_beta_raio times outcome effect or greater. default is 0.
infos::Bool : If true, infos about advancement compute are printed to terminal (default is true)

source
LaScaMolMR.mrStudyNFoldsFunction

Perform Mendelian Randomization study from QTL to GWAS and separating exposure in n folds to avoid loading full dataset in memory. (works if separated in multiple files only)

arguments :

exposure::QTLStudy : exposure QTL data
outcome::GWAS : outcome GWAS data
type::AbstractString : specifies if the analysis is a trans or cis MR analysis (is either "trans" or "cis")
bedbimfam_dirnames::AbstractArray{<:AbstractString} : base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)

options :

mrStudyNFolds contains the same options as mrStudy
n_folds::Integer : Number f folds to separate QTl into.

see mrStudy for other options and examples.

source

Perform Mendelian Randomization study from GWAS to QTL and separating outcome in n folds to avoid loading full dataset in memory. (works if separated in multiple files only)

arguments :

exposure::GWAS : exposure GWAS data
outcome::QTLStudy : outcome QTL data
bedbimfam_dirnames::AbstractArray{<:AbstractString} : base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)

options :

mrStudyNFolds contains the same options as mrStudy
n_folds::Integer : Number f folds to separate QTl into.

see mrStudy for other options and examples.

source

Linkage Desiquilibrium and MR

The ClumpAndMR function performs clumping over every pair of (exposure, outcome) and calls Mendelian randomization fonctions. You can thus use it if potential IV selection was already performed.

LaScaMolMR.ClumpAndMRFunction

Keep independant IVs and perform MR in Omic-wide MR (Trans or Cis) Returns a Dataset of results for each exposure (rows).

arguments :

data : grouped dataset (exposure trait) with specific column names : :trait, :chr, :pos, :β_exp, β_out, :se_exp, :se_out among others.
GenotypesArr::AbstractVector{SnpData} : Reference genotypes. (see SnpArrays) formated with formatSnpData!.

options :

one_file_per_chr_plink::Bool : If true, it is considered that the each index in GenotypesArr corresponds to a chromosome number (default is true)
r2_tresh::AbstractFloat : The maximal r² correlation coefficient for two snps to be considered independant. (default is 0.1)
mr_methods::AbstractVector{Function}: Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw, mr_egger, mr_wm and mr_wald. default value is [mr_ivw, mr_egger]
α:AbstractFloat : α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)
write_ivs::AbstractString : write selected Instrumental variables to specified directory

Examples :

res_d = ClumpAndMR(d, GenotypesArr, r2_tresh = 0.1)
source

Mendelian Randomization functions

LaScaMolMR.mr_outputType

Struct encapsulating the outputs of a Mendelian Randomization analysis

fields :

nivs : number of ivs included in regression
effect : effect size estimate ̂γ₀
se_effect : standard error of effect size estimate
ci_low : lower bound of confidence interval for effect size
ci_high : higher bound of confidence interval for effect size
p : effect size p-value for H₀ : γ₀ = 0
intercept : intercept estimate ̂γ₁
p_intercept : p-value for H₀ : γ₁ = 0
ci_low_intercept : lower bound of confidence interval for intercept
ci_high_intercept : higher bound of confidence interval for intercept
heter_stat : heterogenetity statistic corresponding to Cochran ̂t
heter_p : heterogeneity p-value for H₀ : t = 0

source
LaScaMolMR.mr_waldFunction

Wald ratio for Mendelian Randomization with a single instrumental variable y is outcome, x is exposure

arguments :

β_Y : outcome effect sizes

se_β_Y : standard error of outcome effect size

β_X : exposure effect size

α : α value for confidence intervals (default is 0.05)

source

Wald ratio for Mendelian Randomization with a single instrumental variable y is outcome, x is exposure

arguments :

β_Y : vector of outcome effect sizes

se_β_Y : vector of standard error of outcome effect sizes

β_X : vector of exposure effect sizes

se_β_X : vector of standard error for exposure effect sizes

α : α value for confidence intervals (default is 0.05)

source
LaScaMolMR.mr_ivwFunction

Inverse variance weighted linear regression with simple weights (se(B_Y)^-2) Mendelian Randomization Y is outcome, X is exposure

currently waiting for GLM.jl PR#487 to be merged to use analytical weights instead of doing calculations twice...

arguments :

β_Y : vector of outcome effect sizes

se_β_Y : vector of standard error of outcome effect sizes

β_X : vector of exposure effect sizes

se_β_X : vector of standard error for exposure effect sizes

α : α value for confidence intervals (default is 0.05)

options :

model::Symbol : :random, :fixed or :default effect type. :default is fixed if less than 4 ivs, random otherwise.

source
LaScaMolMR.mr_eggerFunction

Egger Mendelian Randomization Y is outcome, X is exposure

currently waiting for GLM.jl PR#487 to be merged to use analytical weights instead of doing calculations twice...

arguments :

β_Y : vector of outcome effect sizes

se_β_Y : vector of standard error of outcome effect sizes

β_X : vector of exposure effect sizes

se_β_X : vector of standard error for exposure effect sizes

α : α value for confidence intervals (default is 0.05)

source
LaScaMolMR.mr_wmFunction

Weighted Median Mendelian Randomization (Bowden et al., 2015) Y is outcome, X is exposure.

arguments :

β_Y : vector of outcome effect sizes

se_β_Y : vector of standard error of outcome effect sizes

β_X : vector of exposure effect sizes

se_β_X : vector of standard error for exposure effect sizes

α : α value for confidence intervals (default is 0.05)

options :

iterations : number of iterations for estimate of standrard error or effect size. seed : seed of random generator

source

Exported Utilities

Inputs

LaScaMolMR.nfoldsFunction

Partitionate QTLStudy in n folds. Returns a vector of QTLStudy in which each element contains a subsets of file paths and corresponding traits.

arguments :

x::QTLStudy : the qtl files and informations (see QTLStudy)
m::Integer : the number of folds

source

LD

LaScaMolMR.formatSnpData!Function

format Genotype information contained in SnpData for optimised snp search based on chromosome position. Adds a column of tuple (chr::Int8, pos::Int) in snpinfo and sorts snpinfo accordingly. returns nothing

Examples

ref = SnpData(datadir("some/data"))

formatSnpData!(ref)

Note this function does not take into account the user might want to write SnpData in bed, bim, fam format. The changes done by this function can and will cause problems if writing plink files after calling formatSnpData!.

source
LaScaMolMR.clumpFunction

threaded implementation of the clumping algorithm prioritising first snps in given Vector and formated Genotypes SnpData (see formatSnpData!) returns a vector of boolean indicating if each given snp is kept

arguments :

ref_genotypes : reference SnpData (SnpArrays)
snps : vector of chromosome position tuple for each variant

options :

formated : indicates if ref SnpData is already formated according to :chrpos (chr, pos)
`r2
tresh` : minimal r² for 2 snps to be considered strongly correlated.

Examples :

julia> ref = SnpData(datadir("some/data"));

julia> kept_v_b::Vector{Bool} = clump([(1, 123), (1, 456), (1, 789)], ref)
3-element Vector{Int64}:
 1
 0
 1

julia> formatSnpData!(ref);

julia> kept_v_b::Vector{Bool} = clump([(1, 123), (1, 456), (1, 789)], ref, formated = true)
3-element Vector{Int64}:
 1
 0
 1

If formatSnpData has already been called on good snp info type (:chr_pos or :snpid), formated = true option does not verify or modify formating. See formatSnpData!.

source

Contents

Authors

Samuel Mathieu, Hippolyte Minvielle Moncla, Mewen Briend, Valentine Duclos, Anne Rufiange, Patrick Mathieu