LaScaMolMR
Overview
LaScaMolMR.jl (Large Scale Molecular Mendelian Randomization) is a threaded Mendelian Randomization (MR) package that is focused on the generation of transcriptome wide / molecular MR analysies. Although it provides an interface for most common MR regression estimators (Inverse Variance Weighted, Weighted Median, Egger, Wald), its intended use is to enable fast Omic-wide Mendelian Randomization studies. The rise of large genetic cohort data has benefited the statistical power of Genome Wide Association Studies (GWAS) and Quantitative Trait Loci (QTL). Thus enabling findings in extensive studies such as Transcriptome Wide MR (TWMR). LaScaMolMR.jl provides a fast and efficient framework to such analyses, allowing users to customize the parameters of the study.
Tutorial
Example 1 : Cis-MR
For a QTL dataset composed as follows :
base/folder
├── exposureA
│ ├── exposureA_chr1.txt
│ └── exposureA_chr2.txt
├── exposureB
│ ├── exposureB_chr1.txt
│ └── exposureB_chr2.txt
└── exposureC
├── exposureC_chr3.txt
└── exposureC_chr4.txt
With example data composed like this (tab separated) :
chr pos effect_allele other_allele beta se some_other_column pval
1 10511 A G -0.176656 0.136297 . 0.194939
1 10642 A G -0.724554 0.345390 .. 0.035924
1 11008 G C -0.017786 0.016673 ... 0.286088
1 11012 G C -0.017786 0.016673 .. 0.286088
1 13110 A G 0.013272 0.021949 . 0.545411
1 13116 G T -0.027802 0.013111 .. 0.0339672
1 13118 G A -0.027802 0.013111 ... 0.0339672
1 13259 A G -0.122207 0.210776 .. 0.562052
1 13273 C G 0.007077 0.015337 . 0.644463
and a GWAS of outcome composed similarly but comma separated, the following code will generate a cis-MR study with default parameters :
using LaScaMolMR
- Describe the exposure data structure (files and columns).
path_pattern = ["exposure", TRAIT_NAME, "/exposure", TRAIT_NAME, "_chr", CHR, ".txt"]
columns = Dict(1 => CHR, 2 => POS, 3 => A_EFFECT, 4 => A_OTHER, 5 => BETA, 6 => SE, 8 => PVAL)
trait_v = ["A", "B", "C"] # exposure trait identifiers
chr_v = [1, 2, 3] # chromosome corresponding to exposure protein or gene
tss_v = [45287, 984276, 485327765] # position of TSS, if relevant
exposure::QTLStudy = QTLStudy_from_pattern("base/folder/",
path_pattern,
trait_v, chr_v, tss_v,
columns, separator = '\t')
- Describe the outcome data.
outcome = GWAS("/some/file", columns, separator = ',', trait_name = "Some Painful Disease")
- Perform Medelian Randomization study by providing input formats and reference genotype data files.
# Plink 1.9 files base names for each chromosome (You can also use a single file)
plink_files = ["folder/basename_file_chr$(i)" for i in 1:22]
# Perform MR for every exposure - outcome pairs with default parameters
out_table = mrStudy(exposure, outcome, "cis", plink_files)
# with MiLoP approach to internal pleiotropy and other parameters :
out_table_MiLoP = mrStudy(exposure, outcome, "cis", plink_files,
approach = "MiLoP",
r2_tresh = 0.01,
p_tresh = 5e-8,
p_tresh_MiLoP = 5e-4,
pval_bigfolat = true)
Example 2 : Trans-MR
For a QTL dataset composed as follows :
base/folder
├── exposureA.txt
├── exposureB.txt
└── exposureC.txt
With example data composed like this (tab separated) :
chr pos A1 A2 beta se some_useless_column pval
1 10511 A G -0.176656 0.136297 . 0.194939
1 10642 A G -0.724554 0.345390 .. 0.035924
1 11008 G C -0.017786 0.016673 ... 0.286088
1 11012 G C -0.017786 0.016673 .. 0.286088
1 13110 A G 0.013272 0.021949 . 0.545411
1 13116 G T -0.027802 0.013111 .. 0.0339672
1 13118 G A -0.027802 0.013111 ... 0.0339672
1 13259 A G -0.122207 0.210776 .. 0.562052
1 13273 C G 0.007077 0.015337 . 0.644463
and a GWAS of outcome composed similarly but comma separated, the following code will generate a trans-MR study with default parameters :
using LaScaMolMR
- Describe exposure data.
path_pattern = ["exposure", TRAIT_NAME, ".txt"]
columns = Dict(1 => CHR, 2 => POS, 3 => A_EFFECT, 4 => A_OTHER, 5 => BETA, 6 => SE, 8 => PVAL)
trait_v = ["A", "B", "C"] # Chromosome and TSS informations are not relevant in Trans setting.
exposure::QTLStudy = QTLStudy_from_pattern("base/folder/",
path_pattern,
trait_v, chr_v = nothing,
tss_v = nothing,
columns, separator = '\t',
only_corresp_chr = false)
- Describe outcome data.
outcome = GWAS("/some/file", columns, separator = ',', trait_name = "Some Painful Disease")
- Perform Medelian Randomization study by providing input formats and reference genotype data files.
# Plink 1.9 files base names for each chromosome (You can also use a single file)
plink_files = ["folder/basename_file_chr$(i)" for i in 1:22]
# Perform MR for every exposure - outcme pairs with default parameters
out_table = mrStudy(exposure, outcome, "trans", plink_files)
# with MiLoP approach and other parameters :
out_table2 = mrStudy(exposure, outcome, "trans", plink_files,
approach = "MiLoP",
r2_tresh = 0.01,
p_tresh = 5e-8,
filter_beta_ratio = 1)
# Default p_tresh_MiLoP value is the same as p_tresh
MiLoP approach
Mitigated Local Pleiotropy (MiLoP) approach modifies the potential IV selection process to remove instrument variables associated to more than 1 exposure at p_tresh_MiLoP
significance level.
mr_*
regression functions
LaScaMolMR provides four MR regression functions : mr_wald
, mr_ivw
, mr_egger
and mr_wm
performing respectively the wald ratio of the first provided instrument, Inverse Variance Weighted, Egger and Weighted Median regressions.
Example :
out = mr_wm(beta_outcome, se_outcome, beta_exposure, se_exposure;
iteration = 10000, seed = 42, α = 0.05)
mr_output
as a standard output for mr_*
:
The mroutput struct provies a standard interface to functions performing mendelian randomization. This allows for user to use its own MR functions for mrStudy
. Any function receiving 4 vectors, having at least $\alpha$ as an option and outputing a `mroutput` is valid.
Here is the definitioin of mr_output
:
struct mr_output
nivs::Int
effect::Float64
se_effect::Float64
ci_low::Float64
ci_high::Float64
p::Float64
intercept::Float64
p_intercept::Float64
ci_low_intercept::Float64
ci_high_intercept::Float64
heter_stat::Float64
heter_p::Float64
end
Any function following this format could be provided to ClumpAndMR
/mrStudy
in the mr_methods
option, including user-defined functions.
function mr_something(beta_outcome, se_outcome, beta_exposure, se_exposure;
α = 0.05, other_params = default_value
)::mr_output
# Calculate values
...
# end
# Not calculated values can be replaced by NaN
return mr_ouput(nivs, effect, ci_low, ci_high, p, NaN, NaN, NaN, NaN, heter_stat, heter_p)
end
output = NaiveCis(data, genotypes; mr_methods = [mr_ivw, mr_something])
API
Index
LaScaMolMR.GWAS
LaScaMolMR.GenVarInfo
LaScaMolMR.QTLStudy
LaScaMolMR.mr_output
LaScaMolMR.ClumpAndMR
LaScaMolMR.QTLStudy_from_pattern
LaScaMolMR.clump
LaScaMolMR.formatSnpData!
LaScaMolMR.mrStudy
LaScaMolMR.mrStudyNFolds
LaScaMolMR.mr_egger
LaScaMolMR.mr_ivw
LaScaMolMR.mr_wald
LaScaMolMR.mr_wm
LaScaMolMR.nfolds
Provide inputs
These functions allow users to specify the input formats. The QTL summary statistics can be united in a single file or divided by exposure, chromosome or both.
For the sake of simplicity, we consider a genetic association study as a GWAS when a single phenotype was studied and as a QTLStudy otherwise.
LaScaMolMR.GenVarInfo
— TypeEnum of different information present in qtl file path or inside QTL/GWAS text delimited files.
values :
TRAIT_NAME # Name of the trait (e.g. QTL protein name)
CHR # chromosome
POS # position in chromosome
A_EFFECT # effect allele of SNP
A_OTHER # reference allele of SNP
BETA # effect size of SNP
SE # standard error of effect size
PVAL # Pvalue for H₀ : BETA = 0
LaScaMolMR.GWAS
— TypeType GWAS which contains informations about GWAS file
path # path to file
columns # Dictionary of column number => GenVarInfo type of info at column
separator # column separator
trait_name # name of the trait (nothing if not specified)
Example
gwas = GWAS("path/to/file",
Dict(1 => PVAL, 2 => CHR, 3 => POS, 8 => BETA, 9 => SE),
',')
LaScaMolMR.QTLStudy
— TypeType QTLStudy which contains informations about QTL file(s) format and implacement.
The method QTLStudy_from_pattern
helps building information from patterns in file names and is the prefered methods to construct a QTLStudy.
fields :
path_v # vector of file names (path)
traits_for_each_path # Trait corrpespondinf to each path (nothing if files are not trait specific)
trait_v # names of all traits included
chr_v # chromosomes corresponding to each trait (nothing if not relevant)
tss_v # tss position corresponding to each trait (nothing if not relevant)
columns # Dictionary of column number => GenVarInfo type of info at column
separator # column separator
LaScaMolMR.QTLStudy_from_pattern
— FunctionMake QTLStudy which contains informations about QTL file(s) format and implacement from some file pattern
arguments :
folder
is the main folder cntaining all QTL files. path_pattern
is a vector of strngs and GenVarInfo characterizing the pattern of every file name in the folder. trait_v
is an collection of traits that should be incuded in future analysis. chr_v
is a vector of each chromosome of each QTL trait. tss_v
is a vector of every Transcription start site. columns
is a dictionary of all informations contained in columns of QTL files. separator
is the separator of QTL files. only_corresp_chr
indicates if only variants on chromosome correspoding to QTL trait should be kept. Default is true
. Set to false
for Trans MR studies.
TIP : if your file contains a chromosome:position column (e.g. 1:324765), consider setting your separator to [':', sep]
Examples
for a sigle file QTL :
QTLStudy_from_pattern("some/folder", ["qtl.txt"], tv, cv, tssv,
Dict(1 => CHR, 2 => POS, 8 => PVAL, ...),
separator = ' ',
only_corresp_chr = true)
for this arhitecture (in UNIX) :
test/data
├── exposureA
│ ├── exposureA_chr1.txt
│ └── exposureA_chr2.txt
├── exposureB
│ ├── exposureB_chr1.txt
│ └── exposureB_chr2.txt
└── exposureC
├── exposureC_chr3.txt
└── exposureC_chr4.txt
we get this code :
path_pattern = ["exposure", TRAIT_NAME, "/exposure", TRAIT_NAME, "_chr", CHR, ".txt"]
trait_v = ["A", "B", "C"]
chr_v = [1, 2, 3]
tss_v = [45287, 984276, 485327765]
qtl = QTLStudy_from_pattern("test/data", path_pattern, ttrait_v, chr_v, tss_v,
Dict(1 => CHR, 2 => POS, 8 => PVAL, ...),
separator = ' ',
only_corresp_chr = true)
Perform a MR Study
The function mrStudy
allows the users to perform meta-analysis over many pairs of traits given the input format. The function has three implementations depending on the type of studies used as exposure and outcomes. (GWAS -> GWAS, QTL -> GWAS, GWAS -> QTL)
LaScaMolMR.mrStudy
— FunctionPerform a Mendelian Randomization study with exposure QTL and outcome GWAS
arguments :
exposure::QTLStudy
: exposure QTL data outcome::GWAS
: outcome gas data type::AbstractString
: specifies if the analysis is a trans or cis MR analysis (is either "trans"
or "cis"
)bedbimfam_dirnames::AbstractArray{<:AbstractString}
: base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
approach::String
: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")p_tresh::Float64
: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)window::Integer
: maximal distance between a potential Instrument Variable and transciption start site of gene exposure (default is 500000)
`r2tresh::Float64`: maximial corrlation between to SNPs (default is 0.1)exposure_filtered::Bool
: If true, the exposure files are considered already filtered will not filtered on distance to tss and level of significance (default is false)mr_methods::AbstractVector{Function}
: Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output
can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw
, mr_egger
, mr_wm
and mr_wald
. default value is [mr_ivw, mr_egger]
α::Float64
: α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)trsf_pval_exp::Union{Function, Nothing}
: Transformation to apply to pvalues in exposure datasettrsf_pval_out::Union{Function, Nothing}
: t = Transormation to apply on pvalues in outcome datasetlow_ram::Bool
: If true, if the exposure files each contain only one exposure trait, mrStudyNFolds
with n_folds
of 10 will be used.write_ivs::AbstractString
: write selected Instrumental variables to specified directorywrite_filtered_exposure::AbstractString
: write a filtered version of exposure files to specified file name. This file will be tab separated and will only contain columns necessary for further MR Studies.filter_beta_raio::Real
: Filter IVs for which the exposure effect is filter_beta_raio
times outcome effect or greater. default is 0.pval_bigfloat::Bool
: use true
if pvalues can be under 5e-324
. (default is false
)min_maf::Real
: minimal variant maf to be kept as a potential IV. (default is 0)infos::Bool
: If true, infos about advancement compute are printed to terminal (default is true
)
Examples
results = mrStudy(qtl, gwas, "trans", genotypes, 10, approach = "naive", window = 250000, trsf_pval_exp = x -> exp10.(x))
Perform a Trans-Mendelian Randomization study with exposure GWAS and outcome GWAS
arguments :
exposure::GWAS
: exposure QTL data outcome::GWAS
: outcome GWAS data bedbimfam_dirnames::AbstractArray{<:AbstractString}
: base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
approach::String
: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")p_tresh::Float64
: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)r2_tresh::Float64
: maximial corrlation between to SNPs (default is 0.1)exposure_filtered::Bool
: If true, the exposure files are considered already filtered will not filtered on distance to tss and level of significance (default is false)mr_methods::AbstractVector{Function}
: Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output
can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw
, mr_egger
, mr_wm
and mr_wald
. default value is [mr_ivw, mr_egger]
α::Float64
: α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)trsf_pval_exp::Union{Function, Nothing}
: Transformation to apply to pvalues in exposure datasettrsf_pval_out::Union{Function, Nothing}
: t = Transormation to apply on pvalues in outcome datasetlow_ram::Bool
: If true, if the exposure files each contain only one exposure trait, mrStudyNFolds
with n_folds
of 10 will be used. write_ivs::AbstractString
: write selected Instrumental variables to specified directorywrite_filtered_exposure::AbstractString
: write a filtered version of exposure files to specified file name. This file will be tab separated and will only contain columns necessary for further MR Studies.pval_bigfloat::Bool
: use true
if pvalues can be under 5e-324
. (default is false
)filter_beta_raio::Real
: Filter IVs for which the exposure effect is filter_beta_raio
times outcome effect or greater. default is 0.infos::Bool
: If true, infos about advancement compute are printed to terminal (default is true
)
Examples
results = mrStudy(gwas1, gwas2, genotypes, 10, approach = "naive", trsf_pval_exp = x -> exp10.(x))
Perform a Trans-Mendelian Randomization study with exposure GWAS and outcome QTL
arguments :
exposure::GWAS
: exposure GWAS data outcome::QTLStudy
: outcome QTL data bedbimfam_dirnames::AbstractArray{<:AbstractString}
: base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
approach::String
: name of MR study aproach chosen (either naive, test or MiLoP) (default is "naive")p_tresh::Float64
: pvalue threshold for a SNP to be considered associated to an exposure (default is 1e-3)r2_tresh::Float64
: maximial corrlation between to SNPs (default is 0.1)mr_methods::AbstractVector{Function}
: Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output
can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw
, mr_egger
, mr_wm
and mr_wald
. default value is [mr_ivw, mr_egger]
α::Float64
: α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)trsf_pval_exp::Union{Function, Nothing}
: Transformation to apply to pvalues in exposure datasettrsf_pval_out::Union{Function, Nothing}
: t = Transormation to apply on pvalues in outcome datasetlow_ram::Bool
: If true, if the exposure files each contain only one exposure trait, mrStudyNFolds
with n_folds
of 10 will be used. write_ivs::AbstractString
: write selected Instrumental variables to specified directorypval_bigfloat::Bool
: use true
if pvalues can be under 5e-324
. (default is false
)filter_beta_raio::Real
: Filter IVs for which the exposure effect is filter_beta_raio
times outcome effect or greater. default is 0.infos::Bool
: If true, infos about advancement compute are printed to terminal (default is true
)
LaScaMolMR.mrStudyNFolds
— FunctionPerform Mendelian Randomization study from QTL to GWAS and separating exposure in n folds to avoid loading full dataset in memory. (works if separated in multiple files only)
arguments :
exposure::QTLStudy
: exposure QTL data outcome::GWAS
: outcome GWAS data type::AbstractString
: specifies if the analysis is a trans or cis MR analysis (is either "trans"
or "cis"
)bedbimfam_dirnames::AbstractArray{<:AbstractString}
: base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
mrStudyNFolds contains the same options as mrStudy
n_folds::Integer
: Number f folds to separate QTl into.
see mrStudy
for other options and examples.
Perform Mendelian Randomization study from GWAS to QTL and separating outcome in n folds to avoid loading full dataset in memory. (works if separated in multiple files only)
arguments :
exposure::GWAS
: exposure GWAS data outcome::QTLStudy
: outcome QTL data bedbimfam_dirnames::AbstractArray{<:AbstractString}
: base names of Plink bedbimfam files for reference genotypes (see SnpArrays documentation)
options :
mrStudyNFolds contains the same options as mrStudy
n_folds::Integer
: Number f folds to separate QTl into.
see mrStudy
for other options and examples.
Linkage Desiquilibrium and MR
The ClumpAndMR
function performs clumping over every pair of (exposure, outcome) and calls Mendelian randomization fonctions. You can thus use it if potential IV selection was already performed.
LaScaMolMR.ClumpAndMR
— FunctionKeep independant IVs and perform MR in Omic-wide MR (Trans or Cis) Returns a Dataset of results for each exposure (rows).
arguments :
data
: grouped dataset (exposure trait) with specific column names : :trait
, :chr
, :pos
, :β_exp
, β_out
, :se_exp
, :se_out
among others. GenotypesArr::AbstractVector{SnpData}
: Reference genotypes. (see SnpArrays
) formated with formatSnpData!
.
options :
one_file_per_chr_plink::Bool
: If true, it is considered that the each index in GenotypesArr corresponds to a chromosome number (default is true) r2_tresh::AbstractFloat
: The maximal r² correlation coefficient for two snps to be considered independant. (default is 0.1) mr_methods::AbstractVector{Function}
: Functions to use to estimate effect of exposure on outcome. Any Function taking four vectors of same length (βoutcome, seoutcome, βexposure, seexposure) and a Float (α) and returns a value of type mr_output
can be used, that includes user defined functions. Functions already implemented in this module include mr_ivw
, mr_egger
, mr_wm
and mr_wald
. default value is [mr_ivw, mr_egger]
α:AbstractFloat
: α value for confidance intervals of parameter estimations in MR (e.g. 95% CI is α = 0.05, which is the default value)write_ivs::AbstractString
: write selected Instrumental variables to specified directory
Examples :
res_d = ClumpAndMR(d, GenotypesArr, r2_tresh = 0.1)
Mendelian Randomization functions
LaScaMolMR.mr_output
— TypeStruct encapsulating the outputs of a Mendelian Randomization analysis
fields :
nivs
: number of ivs included in regression effect
: effect size estimate ̂γ₀ se_effect
: standard error of effect size estimate ci_low
: lower bound of confidence interval for effect size ci_high
: higher bound of confidence interval for effect size p
: effect size p-value for H₀ : γ₀ = 0 intercept
: intercept estimate ̂γ₁ p_intercept
: p-value for H₀ : γ₁ = 0 ci_low_intercept
: lower bound of confidence interval for intercept ci_high_intercept
: higher bound of confidence interval for intercept heter_stat
: heterogenetity statistic corresponding to Cochran ̂t heter_p
: heterogeneity p-value for H₀ : t = 0
LaScaMolMR.mr_wald
— FunctionWald ratio for Mendelian Randomization with a single instrumental variable y is outcome, x is exposure
arguments :
β_Y
: outcome effect sizes
se_β_Y
: standard error of outcome effect size
β_X
: exposure effect size
α
: α value for confidence intervals (default is 0.05)
Wald ratio for Mendelian Randomization with a single instrumental variable y is outcome, x is exposure
arguments :
β_Y
: vector of outcome effect sizes
se_β_Y
: vector of standard error of outcome effect sizes
β_X
: vector of exposure effect sizes
se_β_X
: vector of standard error for exposure effect sizes
α
: α value for confidence intervals (default is 0.05)
LaScaMolMR.mr_ivw
— FunctionInverse variance weighted linear regression with simple weights (se(B_Y)^-2) Mendelian Randomization Y is outcome, X is exposure
currently waiting for GLM.jl PR#487 to be merged to use analytical weights instead of doing calculations twice...
arguments :
β_Y
: vector of outcome effect sizes
se_β_Y
: vector of standard error of outcome effect sizes
β_X
: vector of exposure effect sizes
se_β_X
: vector of standard error for exposure effect sizes
α
: α value for confidence intervals (default is 0.05)
options :
model::Symbol
: :random
, :fixed
or :default
effect type. :default
is fixed if less than 4 ivs, random otherwise.
LaScaMolMR.mr_egger
— FunctionEgger Mendelian Randomization Y is outcome, X is exposure
currently waiting for GLM.jl PR#487 to be merged to use analytical weights instead of doing calculations twice...
arguments :
β_Y
: vector of outcome effect sizes
se_β_Y
: vector of standard error of outcome effect sizes
β_X
: vector of exposure effect sizes
se_β_X
: vector of standard error for exposure effect sizes
α
: α value for confidence intervals (default is 0.05)
LaScaMolMR.mr_wm
— FunctionWeighted Median Mendelian Randomization (Bowden et al., 2015) Y is outcome, X is exposure.
arguments :
β_Y
: vector of outcome effect sizes
se_β_Y
: vector of standard error of outcome effect sizes
β_X
: vector of exposure effect sizes
se_β_X
: vector of standard error for exposure effect sizes
α
: α value for confidence intervals (default is 0.05)
options :
iterations
: number of iterations for estimate of standrard error or effect size. seed
: seed of random generator
Exported Utilities
Inputs
LaScaMolMR.nfolds
— FunctionPartitionate QTLStudy in n folds. Returns a vector of QTLStudy in which each element contains a subsets of file paths and corresponding traits.
arguments :
x::QTLStudy
: the qtl files and informations (see QTLStudy
) m::Integer
: the number of folds
LD
LaScaMolMR.formatSnpData!
— Functionformat Genotype information contained in SnpData for optimised snp search based on chromosome position. Adds a column of tuple (chr::Int8, pos::Int) in snpinfo and sorts snpinfo accordingly. returns nothing
Examples
ref = SnpData(datadir("some/data"))
formatSnpData!(ref)
Note this function does not take into account the user might want to write SnpData in bed, bim, fam format. The changes done by this function can and will cause problems if writing plink files after calling formatSnpData!
.
LaScaMolMR.clump
— Functionthreaded implementation of the clumping algorithm prioritising first snps in given Vector and formated Genotypes SnpData (see formatSnpData!
) returns a vector of boolean indicating if each given snp is kept
arguments :
ref_genotypes : reference SnpData (SnpArrays)
snps : vector of chromosome position tuple for each variant
options :
formated
: indicates if ref SnpData is already formated according to :chrpos (chr, pos)
`r2tresh` : minimal r² for 2 snps to be considered strongly correlated.
Examples :
julia> ref = SnpData(datadir("some/data"));
julia> kept_v_b::Vector{Bool} = clump([(1, 123), (1, 456), (1, 789)], ref)
3-element Vector{Int64}:
1
0
1
julia> formatSnpData!(ref);
julia> kept_v_b::Vector{Bool} = clump([(1, 123), (1, 456), (1, 789)], ref, formated = true)
3-element Vector{Int64}:
1
0
1
If formatSnpData has already been called on good snp info type (:chr_pos
or :snpid
), formated = true
option does not verify or modify formating. See formatSnpData!
.
Contents
Authors
Samuel Mathieu, Hippolyte Minvielle Moncla, Mewen Briend, Valentine Duclos, Anne Rufiange, Patrick Mathieu