maelstRom results exploration

maelstRom explorations

The results from maelstRom’s Allelic Dispersion tutorial can be explored even further with auxiliary data analyses, as documented in this tutorial. It starts from the previously obtained Allelic Dispersion tutorial results, which are loaded into the workspace below.

library(maelstRom)
#> Package 'maelstRom' version 1.1.11
data("VignetteImage", package = "maelstRom")

knitr::kable(head(dAD_res))

Locus	Gene	dAD_pval	phi_rr	phi_rv	phi_vv	Pi	ThetaHetC	ThetaHetT	RhoC	RhoT	ThetaHom	phi_rr_H0	phi_rv_H0	phi_vv_H0	PiH0	ThetaHetH0	RhoH0	ThetaHomH0	phi_rr_OnlyC	phi_rv_OnlyC	phi_vv_OnlyC	PiOnlyC	ThetaHetOnlyC	ThetaHomOnlyC	phi_rr_OnlyT	phi_rv_OnlyT	phi_vv_OnlyT	PiOnlyT	ThetaHetOnlyT	ThetaHomOnlyT	NumHetC	NumHetT	RobFlagC	RobFlagT	HWEC	HWET	CovC	CovT	CovC_Med	CovT_Med	nrep_H0	nrep_H1	AD_EX_Corr	AD_EX_Corr_pval
Locus1	Gene1	0.0145737	0.4266438	0.4312392	0.1421169	0.4952374	0.0243768	0.0515579	0.0237967	0.0490300	0.0216788	0.4251959	0.4319180	0.1428861	0.4973311	0.0424388	0.0407111	0.0208894	0.3904620	0.4845264	0.1250116	0.4898086	0.0173647	0.0602616	0.4472249	0.4033651	0.1494100	0.4947613	0.0539605	0.0168325	62.01938	108.10186	NoOutliers	NoOutliers	0.9189807	0.4769813	199.52344	270.52985	190.5	255	18	18	0.3159567	0.0008651
Locus2	Gene1	0.0001449	0.7040331	0.2767914	0.0191755	0.5494551	0.0141543	0.0751320	0.0139567	0.0698817	0.0335140	0.7042999	0.2753414	0.0203586	0.5390438	0.0504500	0.0480271	0.0351454	0.7031276	0.2812474	0.0156250	0.5647487	0.0130104	0.0366680	0.7046997	0.2748898	0.0204105	0.5235947	0.0742229	0.0322330	35.99966	73.67046	NoOutliers	NoOutliers	0.8149959	0.8801815	105.92188	125.97388	96.0	114	11	14	0.2560135	0.0288009
Locus3	Gene1	0.0000012	0.7066314	0.2747330	0.0186356	0.4951042	0.0073516	0.0785041	0.0072980	0.0727898	0.0485708	0.7073771	0.2716372	0.0209857	0.4772722	0.0454193	0.0434460	0.0537090	0.7031522	0.2812201	0.0156277	0.5093620	0.0066971	0.0837429	0.7084733	0.2720498	0.0194768	0.4576387	0.0733042	0.0398699	35.99617	72.63730	NoOutliers	NoOutliers	0.8151967	0.8697809	124.57031	150.34082	116.5	134	19	14	0.1352185	0.2540302
Locus4	Gene2	0.3304979	0.3821658	0.4984763	0.1193579	0.6275309	0.0208067	0.0386687	0.0203826	0.0372291	0.0272748	0.3823811	0.4982913	0.1193276	0.6279068	0.0317866	0.0308073	0.0261031	0.4163886	0.4900306	0.0935808	0.6226814	0.0202838	0.0481506	0.3651954	0.5030342	0.1317704	0.6311273	0.0387137	0.0117994	62.72392	134.31014	NoOutliers	NoOutliers	0.7073233	0.7003476	28.30469	20.26592	27.0	20	19	13	0.3249952	0.0106015
Locus5	Gene2	0.6137441	0.5690665	0.3787043	0.0522292	0.5263937	0.0188209	0.0262031	0.0184732	0.0255340	0.0440111	0.5694019	0.3783571	0.0522410	0.5268711	0.0222538	0.0217694	0.0443944	0.5469213	0.3828949	0.0701838	0.5143016	0.0180381	0.0449810	0.5790841	0.3774147	0.0435011	0.5350999	0.0263470	0.0422924	49.01055	100.76974	NoOutliers	NoOutliers	0.9999981	0.7300886	35.84375	25.41573	34.0	24	10	27	-0.2740332	0.0357078
Locus6	Gene2	0.3453054	0.1686252	0.4541586	0.3772162	0.6812257	0.0213316	0.0021231	0.0208861	0.0021186	0.0267148	0.1671000	0.4554184	0.3774816	0.6853380	0.0132906	0.0131163	0.0299717	0.1370091	0.5524136	0.3105773	0.7119506	0.0159835	0.0207748	0.1869743	0.4025645	0.4104612	0.6594220	0.0000000	0.0320631	70.70894	106.27703	NoOutliers	NoOutliers	0.4919931	0.2581178	16.69531	11.32197	15.0	10	16	37	-0.3421650	0.2763032

Exploration: Sample-level AD correlations

If AD is reflective of underlying (epi)genetic dysregulation, it stands to reason that the most extreme (allelically imbalanced) samples that contribute most to increased AD (i.e. in the plots above: those heterozygotes deviating most from \(\pi_{het}\) - marked in a green vertical dotted line - and thus contributing most to broadening of the heterozygous beta-binomial fit) are also the most (epi)genetically dysregulated ones. We can try to detect such a link by correlating each heterozygous sample’s extremity to any known (epi)genetic dysregulation of these samples, such as (promotor) methylation or the occurrence of copy number alterations.

As this tutorial relies on SNP counts alone, we correlate this sample extremity, or “sample-level AD” to the amount of expression itself. After all, the previously mentioned mechanisms (promotor hypermethylation, copy number gains or losses) are known to impact a gene’s expression; so, accordingly, those very dysregulated samples showing extreme sample-level AD could show the most deviant (differential) expression as well. To take count biases into account, the actual measure for a sample’s extremity (sample-level AD) is not e.g. simply deviance from \(\pi_{het}\), but the -log of the minimum of that sample’s heterozygous cumulative distribution function or its complement, which will be higher the more extreme of an observation the sample is.

Though correlating AD with expression is the default setting of the AD_correlator function below, it accepts any per-sample measure (if e.g. hypermethylation percentages or any other dysregulation data are available) as an optional input.

AD_EX_Corrs <- c()
AD_EX_Corr_pvals <- c()

for(LocOI in dAD_res$Locus){
  CountDF <- caseList[[LocOI]]
  dAD_fitres <- dAD_res[dAD_res$Locus == LocOI,]
  
  CorResult <- maelstRom::AD_correlator(CountDF, dAD_fitres, SE = SEmedian, MinCount = 20, method = "spearman")
  
  AD_EX_Corrs <- c(AD_EX_Corrs, as.numeric(CorResult["correlation"]))
  AD_EX_Corr_pvals <- c(AD_EX_Corr_pvals, as.numeric(CorResult["p-value"]))
}

dAD_res$AD_EX_Corr <- AD_EX_Corrs
dAD_res$AD_EX_Corr_pval <- AD_EX_Corr_pvals

cat(paste0((dAD_res$Locus[!is.na(dAD_res$AD_EX_Corr_pval) & dAD_res$AD_EX_Corr_pval < 0.05 & dAD_res$HWEC > 0.001 & dAD_res$HWET > 0.001]), ","))

Locus1, Locus2, Locus4, Locus5, Locus40, Locus74, Locus85, Locus93, Locus112, Locus119, Locus141, Locus174, Locus190, Locus191,

A minimum count filter of 20 is used here to avoid low-count-based biases in the correlations’ computation. Eventually, the returned number of actually significant correlations is relatively small. But they are meant as more of a data exploration tool rather than a hard result or filter criterion anyway, as their analysis isn’t as robust as the population-level AD estimates to detect biologically relevant genes, and both AD in expression can be regulated in many different ways and influenced by many biases (e.g. tumor purity), making overly strong correlations between them and/or other (epi)genetic data not evident. Nevertheless, in cases where e.g. hypermethylation is strongly suspected to cause an allelic imbalance in sequencing data of afflicted samples, computing these correlations can provide soft evidence to that effect.

Session info

sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Dutch_Belgium.utf8 
#> [2] LC_CTYPE=Dutch_Belgium.utf8   
#> [3] LC_MONETARY=Dutch_Belgium.utf8
#> [4] LC_NUMERIC=C                  
#> [5] LC_TIME=Dutch_Belgium.utf8    
#> 
#> time zone: Europe/Brussels
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats    
#> [2] graphics 
#> [3] grDevices
#> [4] utils    
#> [5] datasets 
#> [6] methods  
#> [7] base     
#> 
#> other attached packages:
#> [1] maelstRom_1.1.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] gmp_0.7-5          
#>  [2] sass_0.4.10        
#>  [3] generics_0.1.4     
#>  [4] gtools_3.9.5       
#>  [5] lattice_0.22-5     
#>  [6] stringi_1.8.3      
#>  [7] digest_0.6.35      
#>  [8] magrittr_2.0.3     
#>  [9] rgl_1.3.1          
#> [10] evaluate_1.0.4     
#> [11] grid_4.3.3         
#> [12] RColorBrewer_1.1-3 
#> [13] fastmap_1.2.0      
#> [14] jsonlite_1.8.8     
#> [15] ggnewscale_0.5.2   
#> [16] Formula_1.2-5      
#> [17] purrr_1.0.2        
#> [18] scales_1.4.0       
#> [19] numDeriv_2016.8-1.1
#> [20] textshaping_1.0.1  
#> [21] jquerylib_0.1.4    
#> [22] Rdpack_2.6.4       
#> [23] abind_1.4-8        
#> [24] cli_3.6.2          
#> [25] rlang_1.1.3        
#> [26] rbibutils_2.2.16   
#> [27] base64enc_0.1-3    
#> [28] matlib_1.0.0       
#> [29] cachem_1.1.0       
#> [30] yaml_2.3.10        
#> [31] tools_4.3.3        
#> [32] parallel_4.3.3     
#> [33] memoise_2.0.1      
#> [34] dplyr_1.1.4        
#> [35] ggplot2_3.5.2      
#> [36] hash_2.2.6.3       
#> [37] vctrs_0.6.5        
#> [38] R6_2.6.1           
#> [39] zoo_1.8-12         
#> [40] lifecycle_1.0.4    
#> [41] stringr_1.5.1      
#> [42] fs_1.6.6           
#> [43] car_3.1-3          
#> [44] htmlwidgets_1.6.4  
#> [45] MASS_7.3-60.0.1    
#> [46] ragg_1.4.0         
#> [47] pkgconfig_2.0.3    
#> [48] desc_1.4.3         
#> [49] pkgdown_2.0.9      
#> [50] pillar_1.10.2      
#> [51] bslib_0.9.0        
#> [52] gtable_0.3.6       
#> [53] Rcpp_1.0.12        
#> [54] data.table_1.16.0  
#> [55] glue_1.7.0         
#> [56] systemfonts_1.2.3  
#> [57] xfun_0.52          
#> [58] tibble_3.2.1       
#> [59] tidyselect_1.2.1   
#> [60] rstudioapi_0.17.1  
#> [61] knitr_1.50         
#> [62] farver_2.1.2       
#> [63] xtable_1.8-4       
#> [64] htmltools_0.5.8    
#> [65] patchwork_1.3.1    
#> [66] rmarkdown_2.29     
#> [67] carData_3.0-5      
#> [68] compiler_4.3.3     
#> [69] alabama_2023.1.0

Cedric Stroobandt

2025-10-11

maelstRom explorations

Exploration: Sample-level AD correlations

Session info