AdvancedFitter.Rd
AdvancedFitter
attempts to detect relationships between beta-binomial parameters (pi, theta) and any independent variable(s) suspected to play a role,
e.g. increased theta (overdispersion) in function of age.
It is currently, however, more of an exploratory function, as it does so by fitting these relationships assuming the supplied "WeightCol"
reflects the "true" probability that a certain observation belongs to the genotype under study (heterozygotes) as this is kept invariant;
there is of course no such thing (in reality, an observation has a one true underlying genotype, not "a probability) and, ideally,
the parameter-variable(s) relationships here studied are taken into account during the complete beta-binomial mixture model EM fit.
AdvancedFitter(
ThetaForm = Theta ~ 1,
PiForm = NULL,
SNPdata,
RefCol = "ref_count",
VarCol = "var_count",
WeightCol = NULL,
Pi_start,
Theta_start,
PiLink = "identity",
ThetaLink = "identity",
center = TRUE,
ResetThetaMin = 10^-10,
ResetThetaMax = 10^-1
)
An object describing the (linear) relationship between theta and any independent variables present in `SNPdata`, like `Theta ~ Var1 + Var2`; though the "ThetaLink" input (see later) allows for general linear relationships.
An object describing the (linear) relationship between pi and any independent variables present in `SNPdata`, like `Pi ~ Var1 + Var2`; though the "PiLink" input (see later) allows for general linear relationships.
Dataframe. A dataframe containing reference- and variant allele counts, in columns with names as given by the "RefCol" and "VarCol" inputs. This dataframe also has to contain any independent variables passed to the "ThetaForm" and "PiForm" input arguments.
String. Name of the column in SNPdata containing reference allele counts.
String. Name of the column in SNPdata containing variant allele counts.
String. Optional name of a column of SNPdata containinig per-sample weights, that are - if specified - are used in a weighted maximum likelihood fit (maximizing sum(sample-weights * sample-log-likelihoods))
Number. Starting pi value for numerical optimization (when fitting PiForm, this starting value is used as the intercept and all independent variables start as having a regression coefficient of zero)
Number. Starting theta value for numerical optimization (when fitting ThetaForm, this starting value is used as the intercept and all independent variables start as having a regression coefficient of zero)
String. One of "identity", "log" or "sqrt"; the linear relationship specified by PiForm is fit as PiLink(pi) ~ PiForm
String. One of "identity", "log" or "sqrt"; the linear relationship specified by ThetaForm is fit as ThetaLink(theta) ~ ThetaForm
Logical. If TRUE, centers all exploratory variables (relative to their mean) before performing the maximum likelihood fit. This is recommended when the supplied Pi_start is an expected/mean pi-value across all samples (e.g. the result of a single pi-fit on these samples)
Number. When the supplied Theta_start value is lower than this input, it is reset to this input. Default 10^-10; it is not recommended to change this value.
Number. When the supplied Theta_start value is higher than this input, it is reset to this input. Default 10^-1; it is not recommended to change this value.
A list containing the following components:
The negative of the maximized log-likelihood.
Optimized parameter values, given in the order (1) pi-intercept, (2) theta-intercept, (3) pi regression coefficients, (4) theta regression coefficients.
If theta depends on only one independent variable, the mean of that variable across samples.