LikelyDistsHet is mainly for internal use by EMfit_betabinom_robust. It assists in robustifying the function's EM-fit by iteratively re-fitting the model on the entire input dataset except for one point, after which the difference in heterozygous pi- and theta-estimates and likelihoods is logged. The difference with their respective full-data fit counterparts is a measure for the left-out data point's influence on the model fit; if either one is sufficiently high the data point could be considered an outlier.

LikelyDistsHet(
  ref_counts,
  var_counts,
  sprv,
  parvec_cur,
  NoSplitHet,
  ResetThetaMin,
  ResetThetaMax,
  SE,
  ReEstPars = FALSE
)

Arguments

ref_counts

Numeric vector. reference counts.

var_counts

Numeric vector. variant counts.

sprv

Numeric vector. Each sample's EM-weight reflecting its likelihood to be part of the heterozygous population.

parvec_cur

Numeric vector. Pi and theta (in that order) of the heterozyous peak of the full-data fit.

NoSplitHet

Logical. If TRUE, don't allow the beta-binomial fit for heterozygotes to be bimodal

ResetThetaMin

Number. Initial theta values in numeric optimization get capped at this minimum (e.g. in case the moment estimate is even lower)

ResetThetaMax

Number. Initial theta values in numeric optimization get capped at this maximum (e.g. in case the moment estimate is even higher)

SE

Number. Sequencing error rate.

ReEstPars

Logical. If TRUE, re-estimates parvec_cur given ref_counts and var_counts. This is useless if these are the actual counts of the full dataset, but are useful for an emperical approach in which "expected" parameter- and likelihood-distances if the assumed model is 100 correct are simulated by drawing ref_counts and var_counts from this assumed model (see EMfit_betabinom_robust)

Value

A list containing the following components:

LikDists

A vector containing likelihood distances per sample (2 times full-data log-likelihood minus re-fitted log-likelihood leaving out the sample).

PiDists

A vector containing pi distances per sample (leave-sample-out refitted pi minus full-data pi).

ThetaDists

A vector containing theta distances per sample (leave-sample-out refitted theta minus full-data theta).