maelstRom_EMfitplot plots the results of maelstRom's beta-binomial mixture EM-fit. More specifically, it plots both the observed reference allele fractions as a histogram (with observations optionally rescaled to observations with the same/closest likelihood but all sharing the same total count parameter; see ScaleHist and ScaleCount), and beta-binomial PMFs using the parameters yielded by the EM-fit as (assuming a total read count ScaleCount and compressed to the zero-to-one interval of the reference allele fraction instead of integer reference counts). Optionally, the unshifted fit assuming no allelic bias (heterozygous pi-parameter = 0.5) is plotted as well, if plot_NoShift is TRUE and its distributional parameters are provided as input.

maelstRom_EMfitplot(
  ref_counts = NULL,
  var_counts = NULL,
  pr = NULL,
  prv = NULL,
  pv = NULL,
  theta_hom = NULL,
  theta_het = NULL,
  pr_NoShift = NULL,
  prv_NoShift = NULL,
  pv_NoShift = NULL,
  theta_hom_NoShift = NULL,
  theta_het_NoShift = NULL,
  probshift,
  SE,
  MinCount = 0,
  ScaleCount = 50,
  ScaleHist = FALSE,
  nbins = 100,
  plot_NoShift = FALSE,
  SplitPeaks = TRUE,
  ShiftCols = NULL,
  NoShiftCol = NULL,
  wd_res = NULL,
  chr = "",
  position = "",
  gene = "",
  DataList_out = NULL,
  Geno_AB_res = NULL,
  dAD_res = NULL,
  PlotWhich = "choose",
  BG = "#F2F2F2"
)

Arguments

ref_counts

Numeric vector. Observed reference counts.

var_counts

Numeric vector. Observed variant counts.

pr

Number. Reference homozygote genotype probability of the locus.

prv

Number. Heterozygote genotype probability of the locus.

pv

Number. Variant homozygote genotype probability of the locus.

theta_hom

Number. The beta-binomial overdispersion parameter, as theta, of the homozygous peaks.

theta_het

Number. The beta-binomial overdispersion parameter, as theta, of the heterozygous peak.

pr_NoShift

Number. Optional; reference homozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).

prv_NoShift

Number. Optional; heterozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).

pv_NoShift

Number. Optional; variant homozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).

theta_hom_NoShift

Number. Optional; the beta-binomial overdispersion parameter, as theta, of the homozygous peaks under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).

theta_het_NoShift

Number. Optional; the beta-binomial overdispersion parameter, as theta, of the heterozygous peak under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).

probshift

Number. The reference allele fraction in heterozygotes, i.e. the beta-binomial pi-parameter, which indicates allelic bias when deviating from 0.5

SE

Number. Sequencing error rate.

MinCount

Number. A minimal count filter for plotting only; samples with a lower count will not be included in the plotted histogram.

ScaleCount

Number. Even though the observed number of reads can vary per sample, we have to assume a single number of observed reads to plot PMFs. One can opt to pick a set value for this (the default is 50) or e.g. use the locus' median or mean coverage here, so as to best correspond to the observed data.

ScaleHist

Logical. See the remark given at ScaleCount; if TRUE, tranforms the input data to all have the same total allele counts ScaleCount, with the actual number of reference reads being determined as the quantile function assuming ScaleCount total counts with the same probability as the observed data (this probability being calculated with the actually observed number of counts). The idea here is to better reflect how well the observed data corresponds to the fitted PMF, which is plotted for a single number of observed reads.

nbins

Number. Number of bins for the histogram depicting the observed reference allele fractions.

plot_NoShift

Logical. If TRUE, also plots the PMF of the unshifted fit (see "_NoShift" parameters above)

SplitPeaks

Logical. If TRUE, the PMFs of both homozygotes and the heterozygotes are plotted seperately and using different colors. If FALSE, one total PMF of the mixture model is plotted.

ShiftCols

String (vector). Colors for the plotted PMFs of the fit incorporating allelic bias. Three values are required if SplitPeaks is TRUE, otherwise just one value is required. Default colors are used in case none are given as input.

NoShiftCol

String. Color for the unshifted PMF (no allelic bias; heterozygous pi-parameter = 0.5). There's no option to split the peaks of this unshifted fit as it would make the final figure way too clustered, so only one color value is required. Will default to purple in case no input is given.

wd_res

String. Working directory where plots are saved; if non is given, the plot itself (as a ggplot2 object) is returned instead.

chr

Number. Chromosome of the locus, to be used in naming the output file if a wd_res is given.

position

Number. Position of the locus, to be used in naming the output file if a wd_res is given.

gene

String. Gene the locus is part of, to be used in naming the output file if a wd_res is given; optional.

DataList_out

Dataframe. Alternatively, maelstRom_EMfitplot accepts a dataframes containing per-sample data of the locus of interest. From this, it can infer its ref_counts, var_counts, pr_NoShift, prv_NoShift and pv_NoShift arguments. This is possible as long as EMfit_betabinom_robust was run on this dataframe with its fitH0 argument set to TRUE.

Geno_AB_res

Dataframe with 1 row. Alternatively, maelstRom_EMfitplot accepts a dataframe containing beta-binomial mixture model EM-fit results from calling EMfit_betabinom_robust's wrapper function BetaBinomGenotyping. As such, expected columns are listed on BetaBinomGenotyping's help page, more specifically its produced Geno_AB_res output. The one row provided must correspond to the locus of interest. From this, maelstRom_EMfitplot can infer its pr, prv, pv, theta_hom, theta_het, theta_hom_NoShift, theta_het_NoShift and probshift arguments.

dAD_res

Dataframe with 1 row. Alternatively, maelstRom_EMfitplot accepts a dataframe containing beta-binomial mixture model EM-fit results from calling EMfit_betabinom_popcomb's wrapper function dAD_analysis. As such, expected columns are listed on dAD_analysis's help page, more specifically its produced dAD_res output. The one row provided must correspond to the locus of interest. From this, maelstRom_EMfitplot can infer its pr, prv, pv, theta_hom, theta_het, and probshift arguments. The "NoShift" case can not be plotted in this scenario, as dAD-detection happens without AB-detection. If dAD_res is not NULL, Geno_AB_res has to be NULL and vice-versa. Also, when using dAD_res, you need to specify whether to plot controls or cases using the PlotWhich argument.

PlotWhich

String. Either "control" or "case", to know which data to fetch from dAD_res for plotting. The dataframe given as the DataList_out argument should of course be of the corresponding population.

BG

String. Background color of the returned plots (can be a color name, or a color's hexadecimal code; default "#F2F2F2")