maelstRom explorations

The results from maelstRom’s Allelic Dispersion tutorial can be explored even further with auxiliary data analyses, as documented in this tutorial. It starts from the previously obtained Allelic Dispersion tutorial results, which are loaded into the workspace below.

library(maelstRom)
#> Warning:
#> replacing
#> previous import
#> 'data.table::yearmon'
#> by
#> 'zoo::yearmon'
#> when loading
#> 'maelstRom'
#> Warning:
#> replacing
#> previous import
#> 'data.table::yearqtr'
#> by
#> 'zoo::yearqtr'
#> when loading
#> 'maelstRom'
#> Package 'maelstRom' version 1.1.11
data("VignetteImage", package = "maelstRom")

knitr::kable(head(dAD_res))
Locus Gene dAD_pval phi_rr phi_rv phi_vv Pi ThetaHetC ThetaHetT RhoC RhoT ThetaHom phi_rr_H0 phi_rv_H0 phi_vv_H0 PiH0 ThetaHetH0 RhoH0 ThetaHomH0 phi_rr_OnlyC phi_rv_OnlyC phi_vv_OnlyC PiOnlyC ThetaHetOnlyC ThetaHomOnlyC phi_rr_OnlyT phi_rv_OnlyT phi_vv_OnlyT PiOnlyT ThetaHetOnlyT ThetaHomOnlyT NumHetC NumHetT RobFlagC RobFlagT HWEC HWET CovC CovT CovC_Med CovT_Med NumOutC NumOutT nrep_H0 nrep_H1 QualityC QualityT AD_EX_Corr AD_EX_Corr_pval
Locus1 Gene1 0.0145737 0.4266438 0.4312392 0.1421169 0.4952374 0.0243768 0.0515579 0.0237967 0.0490300 0.0216788 0.4251959 0.4319180 0.1428861 0.4973311 0.0424388 0.0407111 0.0208894 0.3904620 0.4845264 0.1250116 0.4898086 0.0173647 0.0602616 0.4472249 0.4033651 0.1494100 0.4947613 0.0539605 0.0168325 62.01938 108.10186 NoOutliers NoOutliers 0.9189807 0.4769813 199.52344 270.52985 190.5 255 0 0 18 18 0.3159567 0.0008651
Locus2 Gene1 0.0001449 0.7040331 0.2767914 0.0191755 0.5494551 0.0141543 0.0751320 0.0139567 0.0698817 0.0335140 0.7042999 0.2753414 0.0203586 0.5390438 0.0504500 0.0480271 0.0351454 0.7031276 0.2812474 0.0156250 0.5647487 0.0130104 0.0366680 0.7046997 0.2748898 0.0204105 0.5235947 0.0742229 0.0322330 35.99966 73.67046 NoOutliers NoOutliers 0.8149959 0.8801815 105.92188 125.97388 96.0 114 0 0 11 14 0.2560135 0.0288009
Locus3 Gene1 0.0000012 0.7066314 0.2747330 0.0186356 0.4951042 0.0073516 0.0785041 0.0072980 0.0727898 0.0485708 0.7073771 0.2716372 0.0209857 0.4772722 0.0454193 0.0434460 0.0537090 0.7031522 0.2812201 0.0156277 0.5093620 0.0066971 0.0837429 0.7084733 0.2720498 0.0194768 0.4576387 0.0733042 0.0398699 35.99617 72.63730 NoOutliers NoOutliers 0.8151967 0.8697809 124.57031 150.34082 116.5 134 0 0 19 14 0.1352185 0.2540302
Locus4 Gene2 0.3304979 0.3821658 0.4984763 0.1193579 0.6275309 0.0208067 0.0386687 0.0203826 0.0372291 0.0272748 0.3823811 0.4982913 0.1193276 0.6279068 0.0317866 0.0308073 0.0261031 0.4163886 0.4900306 0.0935808 0.6226814 0.0202838 0.0481506 0.3651954 0.5030342 0.1317704 0.6311273 0.0387137 0.0117994 62.72392 134.31014 NoOutliers NoOutliers 0.7073233 0.7003476 28.30469 20.26592 27.0 20 0 0 19 13 0.3249952 0.0106015
Locus5 Gene2 0.6137441 0.5690665 0.3787043 0.0522292 0.5263937 0.0188209 0.0262031 0.0184732 0.0255340 0.0440111 0.5694019 0.3783571 0.0522410 0.5268711 0.0222538 0.0217694 0.0443944 0.5469213 0.3828949 0.0701838 0.5143016 0.0180381 0.0449810 0.5790841 0.3774147 0.0435011 0.5350999 0.0263470 0.0422924 49.01055 100.76974 NoOutliers NoOutliers 0.9999981 0.7300886 35.84375 25.41573 34.0 24 0 0 10 27 -0.2740332 0.0357078
Locus6 Gene2 0.3453054 0.1686252 0.4541586 0.3772162 0.6812257 0.0213316 0.0021231 0.0208861 0.0021186 0.0267148 0.1671000 0.4554184 0.3774816 0.6853380 0.0132906 0.0131163 0.0299717 0.1370091 0.5524136 0.3105773 0.7119506 0.0159835 0.0207748 0.1869743 0.4025645 0.4104612 0.6594220 0.0000000 0.0320631 70.70894 106.27703 NoOutliers NoOutliers 0.4919931 0.2581178 16.69531 11.32197 15.0 10 0 0 16 37 -0.3421650 0.2763032

Exploration: Sample-level AD correlations

If AD is reflective of underlying (epi)genetic dysregulation, it stands to reason that the most extreme (allelically imbalanced) samples that contribute most to increased AD (i.e. in the plots above: those heterozygotes deviating most from \(\pi_{het}\) - marked in a green vertical dotted line - and thus contributing most to broadening of the heterozygous beta-binomial fit) are also the most (epi)genetically dysregulated ones. We can try to detect such a link by correlating each heterozygous sample’s extremity to any known (epi)genetic dysregulation of these samples, such as (promotor) methylation or the occurrence of copy number alterations.

As this tutorial relies on SNP counts alone, we correlate this sample extremity, or “sample-level AD” to the amount of expression itself. After all, the previously mentioned mechanisms (promotor hypermethylation, copy number gains or losses) are known to impact a gene’s expression; so, accordingly, those very dysregulated samples showing extreme sample-level AD could show the most deviant (differential) expression as well. To take count biases into account, the actual measure for a sample’s extremity (sample-level AD) is not e.g. simply deviance from \(\pi_{het}\), but the -log of the minimum of that sample’s heterozygous cumulative distribution function or its complement, which will be higher the more extreme of an observation the sample is.

Though correlating AD with expression is the default setting of the AD_correlator function below, it accepts any per-sample measure (if e.g. hypermethylation percentages or any other dysregulation data are available) as an optional input.

AD_EX_Corrs <- c()
AD_EX_Corr_pvals <- c()

for(LocOI in dAD_res$Locus){
  CountDF <- caseList[[LocOI]]
  dAD_fitres <- dAD_res[dAD_res$Locus == LocOI,]
  
  CorResult <- AD_correlator(CountDF, dAD_fitres, MinCount = 20, method = "spearman")
  
  AD_EX_Corrs <- c(AD_EX_Corrs, as.numeric(CorResult["correlation"]))
  AD_EX_Corr_pvals <- c(AD_EX_Corr_pvals, as.numeric(CorResult["p-value"]))
}

dAD_res$AD_EX_Corr <- AD_EX_Corrs
dAD_res$AD_EX_Corr_pval <- AD_EX_Corr_pvals

cat(paste0((dAD_res$Locus[!is.na(dAD_res$AD_EX_Corr_pval) & dAD_res$AD_EX_Corr_pval < 0.05 & dAD_res$HWEC > 0.001 & dAD_res$HWET > 0.001]), ",")) 

Locus1, Locus2, Locus4, Locus5, Locus40, Locus74, Locus85, Locus93, Locus112, Locus119, Locus141, Locus174, Locus190, Locus191,

A minimum count filter of 20 is used here to avoid low-count-based biases in the correlations’ computation. Eventually, the returned number of actually significant correlations is relatively small. But they are meant as more of a data exploration tool rather than a hard result or filter criterion anyway, as their analysis isn’t as robust as the population-level AD estimates to detect biologically relevant genes, and both AD in expression can be regulated in many different ways and influenced by many biases (e.g. tumor purity), making overly strong correlations between them and/or other (epi)genetic data not evident. Nevertheless, in cases where e.g. hypermethylation is strongly suspected to cause an allelic imbalance in sequencing data of afflicted samples, computing these correlations can provide soft evidence to that effect.

Session info

sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Dutch_Belgium.utf8 
#> [2] LC_CTYPE=Dutch_Belgium.utf8   
#> [3] LC_MONETARY=Dutch_Belgium.utf8
#> [4] LC_NUMERIC=C                  
#> [5] LC_TIME=Dutch_Belgium.utf8    
#> 
#> time zone: Europe/Brussels
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats    
#> [2] graphics 
#> [3] grDevices
#> [4] utils    
#> [5] datasets 
#> [6] methods  
#> [7] base     
#> 
#> other attached packages:
#> [1] maelstRom_1.1.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] gmp_0.7-5          
#>  [2] sass_0.4.9         
#>  [3] utf8_1.2.4         
#>  [4] generics_0.1.3     
#>  [5] gtools_3.9.5       
#>  [6] stringi_1.8.3      
#>  [7] lattice_0.22-5     
#>  [8] digest_0.6.35      
#>  [9] magrittr_2.0.3     
#> [10] evaluate_0.24.0    
#> [11] grid_4.3.3         
#> [12] fastmap_1.2.0      
#> [13] jsonlite_1.8.8     
#> [14] ggnewscale_0.5.0   
#> [15] purrr_1.0.2        
#> [16] fansi_1.0.6        
#> [17] scales_1.3.0       
#> [18] numDeriv_2016.8-1.1
#> [19] textshaping_0.4.0  
#> [20] jquerylib_0.1.4    
#> [21] Rdpack_2.6.1       
#> [22] cli_3.6.2          
#> [23] rlang_1.1.3        
#> [24] rbibutils_2.2.16   
#> [25] munsell_0.5.1      
#> [26] MAGE_1.0.0.18      
#> [27] cachem_1.1.0       
#> [28] yaml_2.3.9         
#> [29] tools_4.3.3        
#> [30] parallel_4.3.3     
#> [31] memoise_2.0.1      
#> [32] dplyr_1.1.4        
#> [33] colorspace_2.1-0   
#> [34] ggplot2_3.5.1      
#> [35] hash_2.2.6.3       
#> [36] vctrs_0.6.5        
#> [37] R6_2.5.1           
#> [38] zoo_1.8-12         
#> [39] lifecycle_1.0.4    
#> [40] stringr_1.5.1      
#> [41] fs_1.6.4           
#> [42] htmlwidgets_1.6.4  
#> [43] MASS_7.3-60.0.1    
#> [44] ragg_1.3.2         
#> [45] pkgconfig_2.0.3    
#> [46] desc_1.4.3         
#> [47] pkgdown_2.0.9      
#> [48] pillar_1.9.0       
#> [49] bslib_0.7.0        
#> [50] gtable_0.3.5       
#> [51] data.table_1.16.0  
#> [52] glue_1.7.0         
#> [53] Rcpp_1.0.12        
#> [54] systemfonts_1.1.0  
#> [55] xfun_0.45          
#> [56] tibble_3.2.1       
#> [57] tidyselect_1.2.1   
#> [58] rstudioapi_0.16.0  
#> [59] knitr_1.48         
#> [60] htmltools_0.5.8    
#> [61] patchwork_1.2.0    
#> [62] rmarkdown_2.27     
#> [63] compiler_4.3.3     
#> [64] alabama_2023.1.0