EMfit_betabinom_popcomb.Rd
EMfit_betabinom_popcomb
, similar to EMfit_betabinom
, fits a beta-binomial mixture model to input reference- and allele RNAseq counts,
but with those counts originating from two different populations (a control- and case-population). All distribution models are jointly fitted except
for the heterozygous overdispersion parameters, which is allowed to differ between the populations. The main purpose of this fit is getting this model's
likelihood to be used in a likelihood ratio test, checking for statistical evidence that this heterozygous overdispersion parameter
truly differs between the populations, which would indicate differential allelic divergence.
EMfit_betabinom_popcomb(
data_counts,
allelefreq = 0.5,
SE,
inbr = 0,
dltaco = 10^-6,
HWE = FALSE,
p_InitEst = FALSE,
p_inits = c(1/3, 1/3, 1/3),
ThetaInits = "moment",
ReEstThetas = "moment",
NoSplitHom = TRUE,
NoSplitHet = TRUE,
ResetThetaMin = 10^-10,
ResetThetaMax = 10^-1,
thetaTRY = c(10^-1, 10^-3, 10^-7),
probshift_init = 0.5,
FirstFewFixed = NULL,
MemLim = 2048,
Xtra = 5,
epsabs = 0.001,
MaxIt = 100
)
Data frame. Data frame of a SNP with reference and variant counts ("ref_count" and "var_count", respectively) for each sample ("sample"). Also required is the "isCase" column, equaling 1 for case-samples and 0 for control-samples
Number. Allele frequency. Only used when pInitEst is TRUE (default = 0.5)
Number. Sequencing error rate.
Number. Degree of inbreeding (default = 0).
Number. Minimal difference between 2 iterations (default = 0.001).
Logical. Should HWE be used for allele frequency estimation, not recommended (default = FALSE).
Logical. Calculate initial estimates of pr, pv and prv from allelefreq, not recommended (default = FALSE).
Numeric vector, "moment" or "TryThree". A vector of length two containing the initial dispersion estimate for the homozygous peaks, followed by the one for the heterozygous peak,
or "moment", in which case moment initial estimates are generated. In case of "TryThree", three initial values given by the thetaTRY
input argument are tried out and the one yielding the best fit is retained.
String. Accepts the methods "moment" or "simple" as arguments, in which case dispersions are re-estimated during every EM-step using a moment estimate or a simple custom procedure respectively, to be used as a starting
Logical. If TRUE, don't allow the beta-binomial fits for homozygotes to be bimodal
Logical. If TRUE, don't allow the beta-binomial fit for heterozygotes to be bimodal
Number. Initial theta values in numeric optimization get capped at this minimum (e.g. in case the moment estimate is even lower)
Number. Initial theta values in numeric optimization get capped at this maximum (e.g. in case the moment estimate is even higher)
Numeric vector. See ReEstThetas
, in case this argument equals "TryThree".
Number. Initial estimate position heterozygous peak (allelic bias; expected fraction of reference reads)
A list containing the following components:
A vector containing fitted model parameters: genotype frequencies, heterozygous peak position (allelic bias), and overdispersion of the homozygous peaks and of the separate heterozygous control- and case-peaks.
A dataframe containing per-samples genotypes likelihoods according to the fitted model.
A character vector containing the most likely per-sample genotypes according to the fitted model.
The number of iterations.
An "!" indicates a bad quality locus or fit (no apparent heterozygotes)