maelstRom_EMfitplot.Rd
maelstRom_EMfitplot
plots the results of maelstRom's beta-binomial mixture EM-fit. More specifically, it plots both the observed reference allele fractions as a histogram
(with observations optionally rescaled to observations with the same/closest likelihood but all sharing the same total count parameter; see ScaleHist
and ScaleCount
),
and beta-binomial PMFs using the parameters yielded by the EM-fit as (assuming a total read count ScaleCount
and compressed to the zero-to-one interval of the reference allele fraction instead of integer reference counts).
Optionally, the unshifted fit assuming no allelic bias (heterozygous pi-parameter = 0.5) is plotted as well,
if plot_NoShift
is TRUE and its distributional parameters are provided as input.
maelstRom_EMfitplot(
ref_counts = NULL,
var_counts = NULL,
pr = NULL,
prv = NULL,
pv = NULL,
theta_hom = NULL,
theta_het = NULL,
pr_NoShift = NULL,
prv_NoShift = NULL,
pv_NoShift = NULL,
theta_hom_NoShift = NULL,
theta_het_NoShift = NULL,
probshift,
SE,
MinCount = 0,
ScaleCount = 50,
ScaleHist = FALSE,
nbins = 100,
plot_NoShift = FALSE,
SplitPeaks = TRUE,
ShiftCols = NULL,
NoShiftCol = NULL,
wd_res = NULL,
chr = "",
position = "",
gene = "",
DataList_out = NULL,
Geno_AB_res = NULL,
dAD_res = NULL,
PlotWhich = "choose",
BG = "#F2F2F2"
)
Numeric vector. Observed reference counts.
Numeric vector. Observed variant counts.
Number. Reference homozygote genotype probability of the locus.
Number. Heterozygote genotype probability of the locus.
Number. Variant homozygote genotype probability of the locus.
Number. The beta-binomial overdispersion parameter, as theta, of the homozygous peaks.
Number. The beta-binomial overdispersion parameter, as theta, of the heterozygous peak.
Number. Optional; reference homozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).
Number. Optional; heterozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).
Number. Optional; variant homozygote genotype probability of the locus under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).
Number. Optional; the beta-binomial overdispersion parameter, as theta, of the homozygous peaks under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).
Number. Optional; the beta-binomial overdispersion parameter, as theta, of the heterozygous peak under the null hypothesis of no allelic bias (heterozygous pi-parameter = 0.5).
Number. The reference allele fraction in heterozygotes, i.e. the beta-binomial pi-parameter, which indicates allelic bias when deviating from 0.5
Number. Sequencing error rate.
Number. A minimal count filter for plotting only; samples with a lower count will not be included in the plotted histogram.
Number. Even though the observed number of reads can vary per sample, we have to assume a single number of observed reads to plot PMFs. One can opt to pick a set value for this (the default is 50) or e.g. use the locus' median or mean coverage here, so as to best correspond to the observed data.
Logical. See the remark given at ScaleCount
; if TRUE, tranforms the input data to all have the same total allele counts ScaleCount
, with the actual number
of reference reads being determined as the quantile function assuming ScaleCount
total counts with the same probability as the observed data (this probability being calculated
with the actually observed number of counts). The idea here is to better reflect how well the observed data corresponds to the fitted PMF, which is plotted for a single number of observed reads.
Number. Number of bins for the histogram depicting the observed reference allele fractions.
Logical. If TRUE, also plots the PMF of the unshifted fit (see "_NoShift" parameters above)
Logical. If TRUE, the PMFs of both homozygotes and the heterozygotes are plotted seperately and using different colors. If FALSE, one total PMF of the mixture model is plotted.
String (vector). Colors for the plotted PMFs of the fit incorporating allelic bias. Three values are required if SplitPeaks
is TRUE, otherwise just one value is required.
Default colors are used in case none are given as input.
String. Color for the unshifted PMF (no allelic bias; heterozygous pi-parameter = 0.5). There's no option to split the peaks of this unshifted fit as it would make the final figure way too clustered, so only one color value is required. Will default to purple in case no input is given.
String. Working directory where plots are saved; if non is given, the plot itself (as a ggplot2
object) is returned instead.
Number. Chromosome of the locus, to be used in naming the output file if a wd_res
is given.
Number. Position of the locus, to be used in naming the output file if a wd_res
is given.
String. Gene the locus is part of, to be used in naming the output file if a wd_res
is given; optional.
Dataframe. Alternatively, maelstRom_EMfitplot
accepts a dataframes containing per-sample data of the locus of interest. From this, it can
infer its ref_counts
, var_counts
, pr_NoShift
, prv_NoShift
and pv_NoShift
arguments. This is possible
as long as EMfit_betabinom_robust
was run on this dataframe with its fitH0
argument set to TRUE.
Dataframe with 1 row. Alternatively, maelstRom_EMfitplot
accepts a dataframe containing beta-binomial mixture model EM-fit results from calling EMfit_betabinom_robust
's
wrapper function BetaBinomGenotyping
. As such, expected columns are listed on BetaBinomGenotyping
's help page, more specifically
its produced Geno_AB_res
output. The one row provided must correspond to the locus of interest. From this, maelstRom_EMfitplot
can
infer its pr
, prv
, pv
, theta_hom
, theta_het
, theta_hom_NoShift
, theta_het_NoShift
and probshift
arguments.
Dataframe with 1 row. Alternatively, maelstRom_EMfitplot
accepts a dataframe containing beta-binomial mixture model EM-fit results from calling EMfit_betabinom_popcomb
's
wrapper function dAD_analysis
. As such, expected columns are listed on dAD_analysis
's help page, more specifically
its produced dAD_res
output. The one row provided must correspond to the locus of interest. From this, maelstRom_EMfitplot
can
infer its pr
, prv
, pv
, theta_hom
, theta_het
, and probshift
arguments. The "NoShift" case can not be plotted in this scenario,
as dAD-detection happens without AB-detection. If dAD_res
is not NULL, Geno_AB_res
has to be NULL and vice-versa. Also, when using dAD_res
,
you need to specify whether to plot controls or cases using the PlotWhich
argument.
String. Either "control" or "case", to know which data to fetch from dAD_res
for plotting.
The dataframe given as the DataList_out
argument should of course be of the corresponding population.
String. Background color of the returned plots (can be a color name, or a color's hexadecimal code; default "#F2F2F2")