EMfit_binom estimates, per locus, parameters of an assumed binomial mixture model via expectation maximization (genotype frequencies, allelic bias i.e. heterozygous p-parameter). Through this fit, per-samples genotype probabilities are obtained as well. A fit assuming no allelic bias (heterozygous p-parameter = 0.5) is performed as well, for the purpose of significant allelic bias detection via likelihood ratio test.

EMfit_binom(
  data_counts,
  SE,
  allelefreq = 0.5,
  inbr = 0,
  dltaco = 0.001,
  HWE = FALSE,
  p_InitEst = FALSE
)

Arguments

data_counts

Data frame. Data frame of a SNP with reference and variant counts ("ref_count" and "var_count", respectively) for each sample ("sample").

SE

Number. Sequencing error rate.

allelefreq

Number. Allele frequency. Only used when pInitEst is TRUE (default = 0.5)

inbr

Number. Degree of inbreeding (default = 0).

dltaco

Number. Minimal difference between 2 iterations (default = 0.001).

HWE

Logical. Should HWE be used for allele frequency estimation, not recommended (default = FALSE).

p_InitEst

Logical. Calculate initial estimates of pr, pv and prv from allelefreq, not recommended (default = FALSE).

Value

A list containing the following components:

AB

The estimated allelic shift.

AB_lrt

The test statistic of the likelihood ratio test.

AB_p

The p-value of the likelihood ratio test.

GOF

The goodness-of-fit value based on the corrected likelihood.

nrep

The number of iterations.

quality

Indicates the quality of a locus. An "!" indicates bad data or a bad fit due to no apparent heterozygosity.

rho_vv

The variant (homozygote) genotype probability for the SNP.

rho_rr

The reference (homozygote) genotype probability for the SNP.

rho_rv

The heterozyous genotype probability for the SNP.

rho_vv_H0

The variant (homozygote) genotype probability with Allelic Bias = 0.5.

rho_rr_H0

The reference (homozygote) genotype probability with Allelic Bias = 0.5.

rho_rv_H0

The heterozyous genotype probability with Allelic Bias = 0.5.

data_hash

Data frame. Input data frame with extra columns: allelefreq, genotype and genotype probabilities (prr, prv, pvv) per sample.