EMfit_betabinom.Rd
EMfit_betabinom
estimates, per locus, parameters of an assumed beta-binomial mixture model via expectation maximization (genotype
frequencies, allelic bias i.e. heterozygous pi parameter, homozygous and heterozygous allelic divergence i.e. their respective theta- (overdispersion) parameters.
Through this fit, per-samples genotype probabilities are obtained as well. Optionally, a fit assuming no allelic bias (heterozygous pi-parameter = 0.5) is
performed as well, for the purpose of significant allelic bias detection via likelihood ratio test.
EMfit_betabinom(
data_counts,
allelefreq = 0.5,
SE,
inbr = 0,
dltaco = 10^-6,
HWE = FALSE,
p_InitEst = FALSE,
probshift_init = NULL,
ThetaInits = "moment",
ReEstThetas = "moment",
NoSplitHom = TRUE,
NoSplitHet = TRUE,
ResetThetaMin = 10^-10,
ResetThetaMax = 10^-1,
thetaTRY = c(10^-1, 10^-3, 10^-7),
fitH0 = TRUE,
FirstFewFixed = NULL,
epsabs = 0.001,
MaxIt = 100
)
Data frame. Data frame of a SNP with reference and variant counts ("ref_count" and "var_count", respectively) for each sample ("sample").
Number. Allele frequency. Only used when pInitEst is TRUE (default = 0.5)
Number. Sequencing error rate.
Number. Degree of inbreeding (default = 0).
Number. Minimal difference between 2 iterations (default = 0.001).
Logical. Should HWE be used for allele frequency estimation, not recommended (default = FALSE).
Logical. Calculate initial estimates of pr, pv and prv from allelefreq, not recommended (default = FALSE).
Number. Initial estimate for the allelic bias in heterozygotes; it's recommended to leave this at NULL, in which case this initial estimate will be 0.5
Numeric vector, "moment" or "TryThree". A vector of length two containing the initial dispersion estimate for the homozygous peaks, followed by the one for the heterozygous peak,
or "moment", in which case moment initial estimates are generated. In case of "TryThree", three initial values given by the thetaTRY
input argument are tried out and the one yielding the best fit is retained.
String. Accepts the methods "moment" or "simple" as arguments, in which case dispersions are re-estimated during every EM-step using a moment estimate or a simple custom procedure respectively, to be used as a starting
Logical. If TRUE, don't allow the beta-binomial fits for homozygotes to be bimodal
Logical. If TRUE, don't allow the beta-binomial fit for heterozygotes to be bimodal
Number. Initial theta values in numeric optimization get capped at this minimum (e.g. in case the moment estimate is even lower)
Number. Initial theta values in numeric optimization get capped at this maximum (e.g. in case the moment estimate is even higher)
Numeric vector. See ReEstThetas
, in case this argument equals "TryThree".
Logical. If TRUE, performs a fit assuming no allelic bias (heterozygous pi-parameter = 0.5) and a subsequent likihood ratio test for significant allelic bias.
A list containing the following components:
The estimated Allelic Bias (estimated reference read fraction for heterozygous samples).
The test statistic of the likelihood ratio test against perfectly balanced Allelic Bias (0.5). Not included if fitH0==FALSE
.
The p-value of the likelihood ratio test against perfectly balanced Allelic Bias (0.5). Not included if fitH0==FALSE
.
The number of iterations.
Indicates the quality of a locus. An "!" indicates bad data or a bad fit due to no apparent heterozygosity.
The variant homozygote genotype frequency for the locus.
The reference homozygote genotype frequency for the locus.
The heterozyous genotype frequency for the locus.
The variant homozygote genotype frequency with Allelic Bias = 0.5. Not included if fitH0==FALSE
.
The reference homozygote genotype frequency with Allelic Bias = 0.5. Not included if fitH0==FALSE
.
The heterozyous genotype frequency with Allelic Bias = 0.5. Not included if fitH0==FALSE
.
Data frame. Input data frame with extra columns: allelefreq, (most likely) genotype, genotype probabilities (prr, prv, pvv), and whether the data point is an outlier ("Outlier", equals 1 if so) per sample.
Final overdispersion estimate for the homozygous peaks.
Final overdispersion estimate for the heterozygous peak.
Final overdispersion estimate for the homozygous peaks with Allelic Bias = 0.5. Not included if fitH0==FALSE
.
Final overdispersion estimate for the heterozygous peak with Allelic Bias = 0.5. Not included if fitH0==FALSE
.
Various Goodness-Of-Fit heuristics.