dAD_analysis is a wrapper function bundling EMfit_betabinom_robust, EMfit_betabinom, EMfit_betabinom_popcomb, pmf_betabinomMix, and HWE_chisquared to run the entire differential Allelic Dispersion detection analysis as discussed in the package vignette. To this end, it goes over the following steps:

  1. Calling EMfit_betabinom_robust on both the control- and case-dataset separately for the sole purpose of outlier detection; detected outliers in both datasets play no part in future Expectation-Maximization-fits.

  2. Perform one fit on the joint (case+control) data using EMfit_betabinom, i.e. assuming all population parameters are shared including the heterozygous overdispersion parameter, i.e. assuming no differential Allelic Dispersion.

  3. Perform a fit on the control- and case-data using EMfit_betabinom_popcomb, which assumes all population parameters are shared except for the heteroyzous overdispersion parameter, i.e. a fit accomodating differential Allelic Dispersion

  4. Statistically test for differential Allelic Dispersion using a likelihood ratio test with one degree of freedom comparing the fits from the two previous points.

  5. Filling out the results dataframe dAD_res.

dAD_analysis(
  datas,
  SE,
  inbr,
  allelefreq = 0.5,
  dltaco = 10^-6,
  HWE = FALSE,
  p_InitEst = FALSE,
  p_inits = c(1/3, 1/3, 1/3),
  ThetaInits = "moment",
  ReEstThetas = "moment",
  NoSplitHom = TRUE,
  NoSplitHet = TRUE,
  ResetThetaMin = 10^-10,
  ResetThetaMax = 10^-1,
  DistRob = "Cook",
  CookMargin = 5,
  LikEmpNum = 1000,
  LikMargin = 0,
  NumHetMin = 5,
  MaxOutFrac = 0.5,
  thetaTRY = c(10^-1, 10^-3, 10^-7),
  probshift_init = 0.5,
  FirstFewFixed = NULL,
  MemLim = 2048,
  Xtra = 5,
  epsabs = 0.001,
  MaxIt = 100
)

Arguments

datas

A list containing two lists of dataframes, which in turn contain control- and case-data (in that order). Subsequent entries in both lists correspond to subsequent loci (lists must be the same size and named per-locus). Each of the dataframes should at least contain both a ref_count and var_count column.

SE

Number. Sequencing error rate population metaparameter.

inbr

Number. Inbreeding coefficient population metaparameter.

allelefreq, dltaco, HWE, p_InitEst, ThetaInits, ReEstThetas, NoSplitHom, NoSplitHet, ResetThetaMin, ResetThetaMax, DistRob, CookMargin, LikEmpNum, LikMargin, NumHetMin, MaxOutFrac, thetaTRY

Remaining control parameters to be passed to EMfit_betabinom_robust, EMfit_betabinom, and EMfit_betabinom_popcomb. See their respective documentation for more information. Usually, these parameter's default values are fine.

Value

A dataframe with one row per locus, containing for that locus in its columns:

Locus

Locus name.

Gene

Gene name.

dAD_pval

Raw likelihood ratio test p-value testing for differential allelic dispersion between controls and cases (i.e. a difference between ThetaHetC and ThetaHetT).

phi_rr

Estimated reference homozygote mixture component in the beta-binomial mixture dAD fit.

phi_rv

Estimated heterozygote mixture component in the beta-binomial mixture dAD fit.

phi_vv

Estimated variant homozygote mixture component in the beta-binomial mixture dAD fit.

Pi

Estimated heterozygous pi parameter (assumed equal in controls and cases) in the beta-binomial mixture dAD fit.

ThetaHetC

Estimated control heterozygous theta parameter in the beta-binomial mixture dAD fit.

ThetaHetT

Estimated case heterozygous theta parameter in the beta-binomial mixture dAD fit.

RhoC

Estimated control heterozygous rho parameter (0-to-1 rescaling of theta) in the beta-binomial mixture dAD fit.

RhoT

Estimated case heterozygous rho parameter (0-to-1 rescaling of theta) in the beta-binomial mixture dAD fit.

ThetaHom

Estimated homozygous theta parameter in the beta-binomial mixture dAD fit.

phi_rr_H0

Estimated reference homozygote mixture component in the beta-binomial mixture fit assuming no dAD.

phi_rv_H0

Estimated heterozygote mixture component in the beta-binomial mixture fit assuming no dAD.

phi_vv_H0

Estimated variant homozygote mixture component in the beta-binomial mixture fit assuming no dAD.

PiH0

Estimated heterozygous pi parameter (assumed equal in controls and cases) in the beta-binomial mixture fit assuming no dAD.

ThetaHetH0

Estimated heterozygous theta parameter in the beta-binomial mixture fit assuming no dAD.

RhoH0

Estimated heterozygous rho parameter (0-to-1 rescaling of theta) in the beta-binomial mixture fit assuming no dAD.

ThetaHomH0

Estimated homozygous theta parameter in the beta-binomial mixture fit assuming no dAD.

phi_rr_OnlyC

Estimated reference homozygote mixture component in the beta-binomial mixture fit on control samples alone.

phi_rv_OnlyC

Estimated heterozygote mixture component in the beta-binomial mixture fit on control samples alone.

phi_vv_OnlyC

Estimated variant homozygote mixture component in the beta-binomial mixture fit on control samples alone.

PiOnlyC

Estimated heterozygous pi parameter in the beta-binomial mixture fit on control samples alone.

ThetaHetOnlyC

Estimated heterozygous theta parameter in the beta-binomial mixture fit on control samples alone.

ThetaHomOnlyC

Estimated homozygous theta parameter in the beta-binomial mixture fit on control samples alone.

phi_rr_OnlyT

Estimated reference homozygote mixture component in the beta-binomial mixture fit on case samples alone.

phi_rv_OnlyT

Estimated heterozygote mixture component in the beta-binomial mixture fit on case samples alone.

phi_vv_OnlyT

Estimated variant homozygote mixture component in the beta-binomial mixture fit on case samples alone.

PiOnlyT

Estimated heterozygous pi parameter in the beta-binomial mixture fit on case samples alone.

ThetaHetOnlyT

Estimated heterozygous theta parameter in the beta-binomial mixture fit on case samples alone.

ThetaHomOnlyT

Estimated homozygous theta parameter in the beta-binomial mixture fit on case samples alone.

NumHetC

Estimated number of heterozygotes in controls, according to a fit on controls alone. Outliers, which are detected at the time of this fit, are counted towards this number, but not used in any following fitting procedures.

NumHetT

Estimated number of heterozygotes in cases, according to a fit on cases alone. Outliers, which are detected at the time of this fit, are counted towards this number, but not used in any following fitting procedures.

RobFlagC

The RobFlag output of the separate outlier-detection fit on controls, see EMfit_betabinom_robust's documentation.

RobFlagT

The RobFlag output of the separate outlier-detection fit on cases, see EMfit_betabinom_robust's documentation.

HWEC

HWE chi squared p-value for controls, see HWE_chisquared. Performed on the outlier-detection fit on controls alone.

HWET

HWE chi squared p-value for cases, see HWE_chisquared. Performed on the outlier-detection fit on cases alone.

CovC

Mean coverage (reference + variant count) across control data.

CovT

Mean coverage (reference + variant count) across case data.

CovC_Med

Median coverage (reference + variant count) across control data.

CovT_Med

Median coverage (reference + variant count) across case data.

NumOutC

Number of outliers detected in control data.

NumOutT

Number of outliers detected in case data.

nrep_H0

Number of expectation-maximization iterations run during the beta-binomial mixture fit assuming no dAD.

nrep_H1

Number of expectation-maximization iterations run during the beta-binomial dAD mixture fit.

QualityC

Quality flag returned by EMfit_betabinom_robust (see documentation) of the fit on control data alone.

QualityT

Quality flag returned by EMfit_betabinom_robust (see documentation) of the fit on case data alone.