Final filtering of maelstRom analysis results and writing to output files

final_filter only retains significantly imprinted SNPs (after adjusting for multiple testing) and SNPs of interest (with suitable GOF and degree of (median) imprinting) over all chromosomes. Results and allelic count files are generated. When both file_all_counts and file_impr_counts are set to FALSE, this function can be used to simply filter the results_df input.

final_filter(
  data_hash,
  results_df,
  results_wd,
  gof_filt = 1.2,
  adj_p_filt = 0.05,
  med_impr_filt = 0.8,
  i_filt = 0.6,
  file_all = TRUE,
  file_impr = TRUE,
  file_all_counts = FALSE,
  file_impr_counts = TRUE
)

Arguments

data_hash: Hash. Hash of SNP positions with a data frame for every SNP position.
results_df: Data frame. Results data frame with columns: "position", "gene", "LRT", "p", "estimated.i", "allele.frequency", "dbSNP", "reference", "variant", "est_SE", "coverage", "nr_samples", "GOF", "symmetry", "med_impr", est_inbreeding", "tot_inbreeding".
results_wd: String. Directory where results files are written to.
gof_filt: Number. Minimal Goodness of Fit, which is the mean(log(sample likelihood under imprinted model * sample coverage + 1)) across samples of a locus. A good (and default) cutoff is 0.8.
adj_p_filt: Number. The FDR adjusted singnificance level filter (default is 0.05).
med_impr_filt: Number. Minimal median imprinting (default is 0.8).
i_filt: Number. Minimal degree of imprinting (default is 0.6).
file_all: Logical. Should a file with all SNP information (imprinted and non-imprinted SNPs) be made (default is TRUE).
file_impr: Logical. Should a file with imprinted SNP information be made (default is TRUE).
file_all_counts: Logical. Should a file with all SNP counts (imprinted and non-imprinted SNPs) be made (default is FALSE).
file_impr_counts: Logical. Should a file with imprinted SNP counts be made (default is TRUE).

Value

Data frame with results filtered on adjusted p-value, GOF, median imprinting and degree of imprinting.