AllelicMeta_est calculates sequencing error rates, inbreeding coefficients, allele frequencies, genotypes and genotype probabilities from ref_counts and var_counts for a specific locus. To this end it fits a (regular) binomial mixture model with no allelic bias in heterozygotes (both alleles equally likely). This somewhat simplistic model (no overdispersion parameter, no allelic bias) makes for an inferior but very fast fit, which is ideal to get rough estimates of the allelic population metaparameters (sequencing error rate and inbreeding coefficient) by e.g. taking their mean or median across a large amount of loci. The per-sample allele frequencies and genotype (probabilities) are less reliable and, as such, not this function's main use.

AllelicMeta_est(
  ref_counts,
  var_counts,
  deltaF = 10^-8,
  maxIT = 100,
  SE_prior = 0.002,
  F_inbr_prior = NULL,
  HetProb = 0.5
)

Arguments

ref_counts

Numeric vector. Reference counts.

var_counts

Numeric vector. Variant counts.

deltaF

Number. Expectation-Maximisation threshold, minimal difference between two consecutive iterations (default is 1e-08).

maxIT

Number. Maximum number of iterations of the Expectation-Maximisation algorithm (default is 100).

SE_prior

Number. Initial estimate of the sequencing error rate (default is 0.002).

F_inbr_prior

Number. Initial estimate of the inbreeding coefficient used for calculating initial genotype frequencies (default is NULL, in which case the initial genotype frequences all get set to 1/3).

HetProb

Number. Allelic bias in heterozygotes (expected reference over total allele count in RNAseq data; default is 0.5

Value

A list containing the following components:

allelefreq

The estimated allele frequency.

SE

The estimated sequencing error rate.

F_inbr

The estimated inbreeding coefficient.

genotypes

The most likely genotype (rr, rv, vv) of each sample.

genoprobs

The genotype probabilties (p(rr), p(rv), p(vv)) for each sample.

nrep

The number of iterations

Examples

AllelicMeta_est(c(5, 8, 10, 3, 5, 6, 23), c(8, 8, 6, 4, 4, 10, 0))
#> $allelefreq
#> [1] 0.5714285
#> 
#> $SE
#> [1] 0
#> 
#> $F_inbr
#> [1] -0.7500002
#> 
#> $genotypes
#> [1] "rv" "rv" "rv" "rv" "rv" "rv" "rr"
#> 
#> $genoprobs
#>       p.rr.        p.rv. p.vv.
#> 1 0.0000000 1.000000e+00     0
#> 2 0.0000000 1.000000e+00     0
#> 3 0.0000000 1.000000e+00     0
#> 4 0.0000000 1.000000e+00     0
#> 5 0.0000000 1.000000e+00     0
#> 6 0.0000000 1.000000e+00     0
#> 7 0.9999993 7.152558e-07     0
#> 
#> $nrep
#> [1] 3
#> 
AllelicMeta_est(c(5, 0, 0, 3, 5, 1, 23), c(1, 8, 6, 2, 0, 10, 0),
    SE_prior = 0.2, F_inbr_prior = 0.1)
#> $allelefreq
#> [1] 0.4931604
#> 
#> $SE
#> [1] 0.03413917
#> 
#> $F_inbr
#> [1] 0.678992
#> 
#> $genotypes
#> [1] "rr" "vv" "vv" "rv" "rr" "vv" "rr"
#> 
#> $genoprobs
#>          p.rr.        p.rv.        p.vv.
#> 1 8.253493e-01 1.746494e-01 1.330900e-06
#> 2 2.353486e-12 1.936346e-03 9.980637e-01
#> 3 1.873889e-09 7.187552e-03 9.928124e-01
#> 4 7.935723e-02 9.177449e-01 2.897870e-03
#> 5 9.857576e-01 1.424238e-02 5.618446e-08
#> 6 8.271749e-14 7.557152e-03 9.924428e-01
#> 7 9.999999e-01 1.029936e-07 4.226079e-34
#> 
#> $nrep
#> [1] 20
#>