Harnessing AI and Supercomputing to Accelerate Biomedical Discovery

Inflation statistics for within-population meta-analysis

Inflation statistics for within-population meta-analysis. Each dot indicates within-population meta-analysis. The horizontal axes show lambda GC values, and the vertical axis shows attenuation ratio. Dotted lines indicate the level of attenuation ratio of 0.2. Image: Ravi Madduri, Argonne National Laboratory

Case Study
Inflation statistics for within-population meta-analysis

Inflation statistics for within-population meta-analysis. Each dot indicates within-population meta-analysis. The horizontal axes show lambda GC values, and the vertical axis shows attenuation ratio. Dotted lines indicate the level of attenuation ratio of 0.2. Image: Ravi Madduri, Argonne National Laboratory

 

Advancing biomedical science relies on analyzing vast genomic datasets, modeling disease processes at high resolution, and rapidly processing sequencing data. From statistically modeling millions of genetic variants to benchmarking data pipelines that handle terabytes of information in hours, these tasks demand immense computational power. Using ALCF resources, researchers are combining HPC and advanced AI methods to enable breakthroughs that could accelerate genetic discovery, refine disease understanding, and improve biomedical data workflows.

Challenge

To uncover the genetic and molecular underpinnings of disease, modern biomedical studies must integrate data from diverse sources, such as large-scale biobanks, single-cell experiments, and high-throughput sequencing. Each dataset presents unique challenges: biobank-scale analyses involve billions of variant-trait comparisons, single-cell studies require methods that capture cell-type-specific genetic effects, and sequencing pipelines must process raw data into usable results quickly and reproducibly.

Approach

Research teams are leveraging ALCF supercomputers to address these challenges. In one large-scale analysis, investigators conducted association tests of more than 1,000 traits and identified instances of pleiotropy where a variant affects multiple traits or diseases. Another team developed scPrediXcan, a transcriptome-wide association framework that uses a deep-learning model to accurately predict single-cell gene expression and discover disease-associated genes from fine-grained biological signals. In a third effort, researchers benchmarked accelerated next-generation sequencing (NGS) analysis pipelines on HPC systems, evaluating their performance and scalability for real-world, high-throughput genomics workloads.

Results

The pleiotropy study produced one of the most comprehensive genome-wide assessments to date, revealing extensive shared genetic architecture among traits and over 30,000 independent genetic associations. The scPrediXcan framework improved the detection of disease-associated genes by leveraging cell-type-specific models, producing insights that would be missed in bulk-tissue analyses. The NGS benchmarking effort identified optimized pipelines that dramatically reduce time-to-results for sequencing data, enabling faster turnaround for research and clinical applications. Together, these projects demonstrate how AI-driven methods, novel algorithms, and HPC infrastructure can transform the scale and resolution of biomedical analysis.

Impact

By applying AI and supercomputing to genetics, transcriptomics, and sequencing workflows, researchers can move from raw data to actionable insights more quickly and with greater precision. These advances lay the groundwork for more targeted disease research, faster diagnostic development, and a future of personalized treatments informed by large-scale genomic and cellular data.

Publications

Samarakoon, P. S., G. Fournous, L. T. Hansen, A. Wijesiri, S. Zhao, R. A. Alex, T. N. Nandi, R. Madduri, A. D. Rowe, G. Thomassen, E. Hovig, and S. Razick. “Benchmarking Accelerated Next-Generation Sequencing Analysis Pipelines,” Bioinformatics Advances (May 2025), Oxford University Press.
https://doi.org/10.1093/bioadv/vbaf085

Zhou, Y., T. Adeluwa, L. Zhu, et al. “scPrediXcan Integrates Advances in Deep Learning and Single-Cell Data into a Powerful Cell-Type-Specific Transcriptome-Wide Association Study Framework,” Cell Genomics (May 2025), Elsevier.
https://doi.org/10.1016/j.xgen.2025.100875

Levin, M. G., S. Koyama, J. Woerner, et al. “Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks,” medRxiv (preprint), openRxiv.
https://doi.org/10.1101/2025.04.18.25326074

Allocations
Systems