Challenge
To uncover the genetic and molecular underpinnings of disease, modern biomedical studies must integrate data from diverse sources, such as large-scale biobanks, single-cell experiments, and high-throughput sequencing. Each dataset presents unique challenges: biobank-scale analyses involve billions of variant-trait comparisons, single-cell studies require methods that capture cell-type-specific genetic effects, and sequencing pipelines must process raw data into usable results quickly and reproducibly.
Approach
Research teams are leveraging ALCF supercomputers to address these challenges. In one large-scale analysis, investigators conducted association tests of more than 1,000 traits and identified instances of pleiotropy where a variant affects multiple traits or diseases. Another team developed scPrediXcan, a transcriptome-wide association framework that uses a deep-learning model to accurately predict single-cell gene expression and discover disease-associated genes from fine-grained biological signals. In a third effort, researchers benchmarked accelerated next-generation sequencing (NGS) analysis pipelines on HPC systems, evaluating their performance and scalability for real-world, high-throughput genomics workloads.
Results
The pleiotropy study produced one of the most comprehensive genome-wide assessments to date, revealing extensive shared genetic architecture among traits and over 30,000 independent genetic associations. The scPrediXcan framework improved the detection of disease-associated genes by leveraging cell-type-specific models, producing insights that would be missed in bulk-tissue analyses. The NGS benchmarking effort identified optimized pipelines that dramatically reduce time-to-results for sequencing data, enabling faster turnaround for research and clinical applications. Together, these projects demonstrate how AI-driven methods, novel algorithms, and HPC infrastructure can transform the scale and resolution of biomedical analysis.
Impact
By applying AI and supercomputing to genetics, transcriptomics, and sequencing workflows, researchers can move from raw data to actionable insights more quickly and with greater precision. These advances lay the groundwork for more targeted disease research, faster diagnostic development, and a future of personalized treatments informed by large-scale genomic and cellular data.
Publications
Samarakoon, P. S., G. Fournous, L. T. Hansen, A. Wijesiri, S. Zhao, R. A. Alex, T. N. Nandi, R. Madduri, A. D. Rowe, G. Thomassen, E. Hovig, and S. Razick. “Benchmarking Accelerated Next-Generation Sequencing Analysis Pipelines,” Bioinformatics Advances (May 2025), Oxford University Press.
https://doi.org/10.1093/bioadv/vbaf085
Zhou, Y., T. Adeluwa, L. Zhu, et al. “scPrediXcan Integrates Advances in Deep Learning and Single-Cell Data into a Powerful Cell-Type-Specific Transcriptome-Wide Association Study Framework,” Cell Genomics (May 2025), Elsevier.
https://doi.org/10.1016/j.xgen.2025.100875
Levin, M. G., S. Koyama, J. Woerner, et al. “Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks,” medRxiv (preprint), openRxiv.
https://doi.org/10.1101/2025.04.18.25326074