maelstRom_results_exploration.Rmd
The results from maelstRom’s Allelic Dispersion tutorial can be explored even further with auxiliary data analyses, as documented in this tutorial. It starts from the previously obtained Allelic Dispersion tutorial results, which are loaded into the workspace below.
library(maelstRom)
#> Warning:
#> replacing
#> previous import
#> 'data.table::yearmon'
#> by
#> 'zoo::yearmon'
#> when loading
#> 'maelstRom'
#> Warning:
#> replacing
#> previous import
#> 'data.table::yearqtr'
#> by
#> 'zoo::yearqtr'
#> when loading
#> 'maelstRom'
#> Package 'maelstRom' version 1.1.11
data("VignetteImage", package = "maelstRom")
knitr::kable(head(dAD_res))
Locus | Gene | dAD_pval | phi_rr | phi_rv | phi_vv | Pi | ThetaHetC | ThetaHetT | RhoC | RhoT | ThetaHom | phi_rr_H0 | phi_rv_H0 | phi_vv_H0 | PiH0 | ThetaHetH0 | RhoH0 | ThetaHomH0 | phi_rr_OnlyC | phi_rv_OnlyC | phi_vv_OnlyC | PiOnlyC | ThetaHetOnlyC | ThetaHomOnlyC | phi_rr_OnlyT | phi_rv_OnlyT | phi_vv_OnlyT | PiOnlyT | ThetaHetOnlyT | ThetaHomOnlyT | NumHetC | NumHetT | RobFlagC | RobFlagT | HWEC | HWET | CovC | CovT | CovC_Med | CovT_Med | NumOutC | NumOutT | nrep_H0 | nrep_H1 | QualityC | QualityT | AD_EX_Corr | AD_EX_Corr_pval |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Locus1 | Gene1 | 0.0145737 | 0.4266438 | 0.4312392 | 0.1421169 | 0.4952374 | 0.0243768 | 0.0515579 | 0.0237967 | 0.0490300 | 0.0216788 | 0.4251959 | 0.4319180 | 0.1428861 | 0.4973311 | 0.0424388 | 0.0407111 | 0.0208894 | 0.3904620 | 0.4845264 | 0.1250116 | 0.4898086 | 0.0173647 | 0.0602616 | 0.4472249 | 0.4033651 | 0.1494100 | 0.4947613 | 0.0539605 | 0.0168325 | 62.01938 | 108.10186 | NoOutliers | NoOutliers | 0.9189807 | 0.4769813 | 199.52344 | 270.52985 | 190.5 | 255 | 0 | 0 | 18 | 18 | 0.3159567 | 0.0008651 | ||
Locus2 | Gene1 | 0.0001449 | 0.7040331 | 0.2767914 | 0.0191755 | 0.5494551 | 0.0141543 | 0.0751320 | 0.0139567 | 0.0698817 | 0.0335140 | 0.7042999 | 0.2753414 | 0.0203586 | 0.5390438 | 0.0504500 | 0.0480271 | 0.0351454 | 0.7031276 | 0.2812474 | 0.0156250 | 0.5647487 | 0.0130104 | 0.0366680 | 0.7046997 | 0.2748898 | 0.0204105 | 0.5235947 | 0.0742229 | 0.0322330 | 35.99966 | 73.67046 | NoOutliers | NoOutliers | 0.8149959 | 0.8801815 | 105.92188 | 125.97388 | 96.0 | 114 | 0 | 0 | 11 | 14 | 0.2560135 | 0.0288009 | ||
Locus3 | Gene1 | 0.0000012 | 0.7066314 | 0.2747330 | 0.0186356 | 0.4951042 | 0.0073516 | 0.0785041 | 0.0072980 | 0.0727898 | 0.0485708 | 0.7073771 | 0.2716372 | 0.0209857 | 0.4772722 | 0.0454193 | 0.0434460 | 0.0537090 | 0.7031522 | 0.2812201 | 0.0156277 | 0.5093620 | 0.0066971 | 0.0837429 | 0.7084733 | 0.2720498 | 0.0194768 | 0.4576387 | 0.0733042 | 0.0398699 | 35.99617 | 72.63730 | NoOutliers | NoOutliers | 0.8151967 | 0.8697809 | 124.57031 | 150.34082 | 116.5 | 134 | 0 | 0 | 19 | 14 | 0.1352185 | 0.2540302 | ||
Locus4 | Gene2 | 0.3304979 | 0.3821658 | 0.4984763 | 0.1193579 | 0.6275309 | 0.0208067 | 0.0386687 | 0.0203826 | 0.0372291 | 0.0272748 | 0.3823811 | 0.4982913 | 0.1193276 | 0.6279068 | 0.0317866 | 0.0308073 | 0.0261031 | 0.4163886 | 0.4900306 | 0.0935808 | 0.6226814 | 0.0202838 | 0.0481506 | 0.3651954 | 0.5030342 | 0.1317704 | 0.6311273 | 0.0387137 | 0.0117994 | 62.72392 | 134.31014 | NoOutliers | NoOutliers | 0.7073233 | 0.7003476 | 28.30469 | 20.26592 | 27.0 | 20 | 0 | 0 | 19 | 13 | 0.3249952 | 0.0106015 | ||
Locus5 | Gene2 | 0.6137441 | 0.5690665 | 0.3787043 | 0.0522292 | 0.5263937 | 0.0188209 | 0.0262031 | 0.0184732 | 0.0255340 | 0.0440111 | 0.5694019 | 0.3783571 | 0.0522410 | 0.5268711 | 0.0222538 | 0.0217694 | 0.0443944 | 0.5469213 | 0.3828949 | 0.0701838 | 0.5143016 | 0.0180381 | 0.0449810 | 0.5790841 | 0.3774147 | 0.0435011 | 0.5350999 | 0.0263470 | 0.0422924 | 49.01055 | 100.76974 | NoOutliers | NoOutliers | 0.9999981 | 0.7300886 | 35.84375 | 25.41573 | 34.0 | 24 | 0 | 0 | 10 | 27 | -0.2740332 | 0.0357078 | ||
Locus6 | Gene2 | 0.3453054 | 0.1686252 | 0.4541586 | 0.3772162 | 0.6812257 | 0.0213316 | 0.0021231 | 0.0208861 | 0.0021186 | 0.0267148 | 0.1671000 | 0.4554184 | 0.3774816 | 0.6853380 | 0.0132906 | 0.0131163 | 0.0299717 | 0.1370091 | 0.5524136 | 0.3105773 | 0.7119506 | 0.0159835 | 0.0207748 | 0.1869743 | 0.4025645 | 0.4104612 | 0.6594220 | 0.0000000 | 0.0320631 | 70.70894 | 106.27703 | NoOutliers | NoOutliers | 0.4919931 | 0.2581178 | 16.69531 | 11.32197 | 15.0 | 10 | 0 | 0 | 16 | 37 | -0.3421650 | 0.2763032 |
If AD is reflective of underlying (epi)genetic dysregulation, it stands to reason that the most extreme (allelically imbalanced) samples that contribute most to increased AD (i.e. in the plots above: those heterozygotes deviating most from \(\pi_{het}\) - marked in a green vertical dotted line - and thus contributing most to broadening of the heterozygous beta-binomial fit) are also the most (epi)genetically dysregulated ones. We can try to detect such a link by correlating each heterozygous sample’s extremity to any known (epi)genetic dysregulation of these samples, such as (promotor) methylation or the occurrence of copy number alterations.
As this tutorial relies on SNP counts alone, we correlate this sample extremity, or “sample-level AD” to the amount of expression itself. After all, the previously mentioned mechanisms (promotor hypermethylation, copy number gains or losses) are known to impact a gene’s expression; so, accordingly, those very dysregulated samples showing extreme sample-level AD could show the most deviant (differential) expression as well. To take count biases into account, the actual measure for a sample’s extremity (sample-level AD) is not e.g. simply deviance from \(\pi_{het}\), but the -log of the minimum of that sample’s heterozygous cumulative distribution function or its complement, which will be higher the more extreme of an observation the sample is.
Though correlating AD with expression is the default setting of the
AD_correlator
function below, it accepts any per-sample
measure (if e.g. hypermethylation percentages or any other dysregulation
data are available) as an optional input.
AD_EX_Corrs <- c()
AD_EX_Corr_pvals <- c()
for(LocOI in dAD_res$Locus){
CountDF <- caseList[[LocOI]]
dAD_fitres <- dAD_res[dAD_res$Locus == LocOI,]
CorResult <- AD_correlator(CountDF, dAD_fitres, MinCount = 20, method = "spearman")
AD_EX_Corrs <- c(AD_EX_Corrs, as.numeric(CorResult["correlation"]))
AD_EX_Corr_pvals <- c(AD_EX_Corr_pvals, as.numeric(CorResult["p-value"]))
}
dAD_res$AD_EX_Corr <- AD_EX_Corrs
dAD_res$AD_EX_Corr_pval <- AD_EX_Corr_pvals
cat(paste0((dAD_res$Locus[!is.na(dAD_res$AD_EX_Corr_pval) & dAD_res$AD_EX_Corr_pval < 0.05 & dAD_res$HWEC > 0.001 & dAD_res$HWET > 0.001]), ","))
Locus1, Locus2, Locus4, Locus5, Locus40, Locus74, Locus85, Locus93, Locus112, Locus119, Locus141, Locus174, Locus190, Locus191,
A minimum count filter of 20 is used here to avoid low-count-based biases in the correlations’ computation. Eventually, the returned number of actually significant correlations is relatively small. But they are meant as more of a data exploration tool rather than a hard result or filter criterion anyway, as their analysis isn’t as robust as the population-level AD estimates to detect biologically relevant genes, and both AD in expression can be regulated in many different ways and influenced by many biases (e.g. tumor purity), making overly strong correlations between them and/or other (epi)genetic data not evident. Nevertheless, in cases where e.g. hypermethylation is strongly suspected to cause an allelic imbalance in sequencing data of afflicted samples, computing these correlations can provide soft evidence to that effect.
sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=Dutch_Belgium.utf8
#> [2] LC_CTYPE=Dutch_Belgium.utf8
#> [3] LC_MONETARY=Dutch_Belgium.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Belgium.utf8
#>
#> time zone: Europe/Brussels
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats
#> [2] graphics
#> [3] grDevices
#> [4] utils
#> [5] datasets
#> [6] methods
#> [7] base
#>
#> other attached packages:
#> [1] maelstRom_1.1.11
#>
#> loaded via a namespace (and not attached):
#> [1] gmp_0.7-5
#> [2] sass_0.4.9
#> [3] utf8_1.2.4
#> [4] generics_0.1.3
#> [5] gtools_3.9.5
#> [6] stringi_1.8.3
#> [7] lattice_0.22-5
#> [8] digest_0.6.35
#> [9] magrittr_2.0.3
#> [10] evaluate_0.24.0
#> [11] grid_4.3.3
#> [12] fastmap_1.2.0
#> [13] jsonlite_1.8.8
#> [14] ggnewscale_0.5.0
#> [15] purrr_1.0.2
#> [16] fansi_1.0.6
#> [17] scales_1.3.0
#> [18] numDeriv_2016.8-1.1
#> [19] textshaping_0.4.0
#> [20] jquerylib_0.1.4
#> [21] Rdpack_2.6.1
#> [22] cli_3.6.2
#> [23] rlang_1.1.3
#> [24] rbibutils_2.2.16
#> [25] munsell_0.5.1
#> [26] MAGE_1.0.0.18
#> [27] cachem_1.1.0
#> [28] yaml_2.3.9
#> [29] tools_4.3.3
#> [30] parallel_4.3.3
#> [31] memoise_2.0.1
#> [32] dplyr_1.1.4
#> [33] colorspace_2.1-0
#> [34] ggplot2_3.5.1
#> [35] hash_2.2.6.3
#> [36] vctrs_0.6.5
#> [37] R6_2.5.1
#> [38] zoo_1.8-12
#> [39] lifecycle_1.0.4
#> [40] stringr_1.5.1
#> [41] fs_1.6.4
#> [42] htmlwidgets_1.6.4
#> [43] MASS_7.3-60.0.1
#> [44] ragg_1.3.2
#> [45] pkgconfig_2.0.3
#> [46] desc_1.4.3
#> [47] pkgdown_2.0.9
#> [48] pillar_1.9.0
#> [49] bslib_0.7.0
#> [50] gtable_0.3.5
#> [51] data.table_1.16.0
#> [52] glue_1.7.0
#> [53] Rcpp_1.0.12
#> [54] systemfonts_1.1.0
#> [55] xfun_0.45
#> [56] tibble_3.2.1
#> [57] tidyselect_1.2.1
#> [58] rstudioapi_0.16.0
#> [59] knitr_1.48
#> [60] htmltools_0.5.8
#> [61] patchwork_1.2.0
#> [62] rmarkdown_2.27
#> [63] compiler_4.3.3
#> [64] alabama_2023.1.0