Murat Keçeli, a computational scientist at the U.S. Department of Energy’s (DOE) Argonne National Laboratory, discusses his efforts to ensure that the first users of the Argonne Leadership Computing Facility’s (ALCF’s) forthcoming exascale system, Aurora, are able to work with Jupyter Notebooks, interactive web platforms common in high performance computing (HPC) research environments. The ALCF is a U.S. DOE Office of Science user facility.
How long have you worked in HPC?
The first job I submitted to an HPC cluster dates back to the summer of 2007 when I started working as a PhD student. It was a 32-node Beowulf cluster, Haku, and I was mainly submitting independent jobs for each node, which is considered more as capacity computing. The first supercomputer I ran jobs on was the ALCF’s Mira system and it dates back to the time I started as a postdoctoral researcher at Argonne in 2014. Only then, I learned about running jobs in the capability-mode, which roughly means running a single job parallelized over thousands of nodes using MPI. I should note that I owe most of what I learned about supercomputers to ATPESC (Argonne Training Program on Extreme-Scale Computing), which truly was a life-changing experience.
Exascale development is largely uncharted territory. What is it like working on a project to enable science on these powerful, first-of-a-kind systems?
It is very exciting for sure but there is also significant pressure. Basically, we have the potential to study problems orders of magnitude larger than what we can tackle with conventionally available resources. However, realizing that potential into a true capability that other scientists can access is a tough problem. Developing scientific software with the features, the flexibility, and the performance required for the most challenging applications of our time can take many human-years of effort. The Exascale Computing Project (ECP) gave us the time to be ready when the exascale machines are powered up; however, there should be a continuation in the support to make sure these projects live up to the standards of the users.
What are Jupyter Notebooks? How do they aid scientific research?
I see Jupyter Notebooks as interactive training, research, and development environments. Jupyter Notebooks enable coding, documentation, and visualization to be combined into a single document. Hence, they are very useful for experimenting with new ideas, prototyping, performing data analysis, and writing tutorials. While initially supporting only Julia, Python, and R, Jupyter Notebooks now support more than 40 programming languages. Moreover, cloud services such as Google Colab and GitHub Codespaces enable effective collaboration using Jupyter Notebooks. All these features made Jupyter Notebooks the most popular platform for data scientists and there is an ongoing effort to use Jupyter Notebooks for reproducible research and publications. I’d like to emphasize their role in education and training as well. They are now an integral part of academic courses and workshops to provide interactive tutorials for the students.
How will Jupyter Notebooks be used with Aurora? Will using Jupyter Notebooks with exascale systems differ from using them with traditional CPU-based systems?
The shift from CPU-based systems to GPU-based systems is parallel to the shift from traditional simulation-based workflows to artificial intelligence/machine learning- (AI/ML-) based workflows. This means a significant change in the HPC user profile. Since Jupyter Notebooks are more popular among data science users, it is natural to expect more utilization of Jupyter Notebooks with Aurora. I believe ALCF JupyterHub will provide a more user-friendly (compared to traditional terminal-based access) gateway to Aurora, where users can submit and monitor their jobs and analyze the data on the fly.
What are the challenges you face in preparing Jupyter Notebooks for Aurora’s rollout? What is your strategy for preparing Jupyter for exascale?
I think the main challenge for the rollout of any state-of-the-art supercomputers is the user experience. While these systems are very capable in the abstract, utilizing these resources efficiently is a real challenge—particularly for the initial users. Our aim with Jupyter Notebooks is to lower the barrier for new users, providing them a smoother experience. We will leverage interactive features such as Jupyter widgets to prepare self-contained tutorials that can help users to get started. A variety of teaching materials for ALCF workshops are now based on Jupyter Notebooks.
Who do you collaborate with as part of your work at the ALCF?
I started working at Argonne as a postdoctoral researcher in the Chemical Dynamics group of the Chemical Sciences and Engineering (CSE) division. I worked on a couple of projects that involve scientists from CSE, Materials Science, Mathematics and Computer Science, and ALCF. I continue to collaborate with these colleagues on related projects and proposals. In 2018, I joined the ALCF Data Science team and the Computational Science division. ALCF workshops and different computer allocation programs enabled me to connect with researchers from all around the world. Just in April we published a Nanoscale paper on using machine learning potentials with scientists from Belgium and Turkey based on a collaboration that started with an earlier ALCF workshop. For the last three years, I have been mainly working on the NWChemEx ECP project. This is a very large project that involves six national laboratories (Ames, Argonne, Brookhaven, Lawrence Berkeley, Oak Ridge, and Pacific Northwest national laboratories) and Virginia Tech University. I would like to also mention the Sustainable Horizon Institute’s Sustainable Research Pathways program that I participated in last year. This workforce development program enabled me to connect with researchers from underrepresented communities and I am hosting three summer students this year.
How has your approach to preparing for exascale evolved over time?
Initially, I was mainly concerned about strong scaling and performance optimization. While this is a very important area in HPC, with time my interest shifted more on the workflow development and user experience side. This is an area that becomes more important with the increasing interest in projects that aim to couple AI/ML with traditional simulations.
The Aurora Software Development Series examines the range of activities and collaborations that ALCF staff undertake to guide the facility and its users into the next era of scientific computing.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.