Many early career scientists, eager to learn to use the world’s most powerful supercomputers, anticipate acquiring essential skills for their careers in the computational science and engineering world by participating in the Argonne Training Program on Extreme-Scale Computing (ATPESC).
Hosted by the U.S. Department of Energy’s (DOE) Argonne National Laboratory, more than 600 scientists worldwide have participated in ATPESC to date. A unique and intensive training program, ATPESC will mark its 10th anniversary this year, when it will be held from July 31 to August 12. The call for applications for the 2022 program has been extended to March 7, 2022.
Since its inception, ATPESC has been carefully designed for hands-on training on world-class supercomputers, while also providing tours of Argonne facilities and special networking events. The two-week program offers in-depth instruction on key skills, approaches, and tools needed to design, implement, and execute computational science and engineering applications on high performance computing (HPC) systems, including DOE’s upcoming exascale supercomputers.
“ATPESC has an important mission and I’m confident it will continue to be a valuable program for growing the community of researchers who use supercomputers for science in the ever-changing landscape of high performance computing,” said Raymond Loy, ATPESC director and Argonne Leadership Computing Facility (ALCF) lead for training, debuggers, and math libraries.
As part of the program, participants are given access to supercomputers at the ALCF, the Oak Ridge Leadership Computing Facility (OLCF), and the National Energy Research Scientific Computing Center (NERSC). The ALCF, OLCF, and NERSC are DOE Office of Science user facilities located at Argonne, Oak Ridge, and Lawrence Berkeley National Laboratories, respectively. ATPESC training sessions are led by some of the world’s foremost HPC experts from national laboratories, universities, and the computing industry.
ATPESC’s origin story
The idea to create ATPESC came from former Argonne Distinguished Fellow and ALCF Director of Science Paul Messina, who retired in 2019. A decade ago, he noticed that many users of advanced computer systems lacked the expertise required to use them effectively.
“Having been deeply involved in activities related to developing and using exascale systems, I knew that computer architectures would become more complex, as would the applications that would require exascale computing power,” said Messina.
Messina observed that universities did not have courses that covered the multiple facets of computational science research at the time. He had already addressed a similar situation in the late 1990s for grid computing, a precursor of cloud computing. For this application, he and several colleagues organized and held international two-week summer schools that covered the relevant topics and included substantial hands-on exercises.
Messina also was inspired by the advent of commercial parallel computers in the early 1980s. At the time, he was founding director of Argonne’s Mathematics and Computer Science division, which established the Advanced Computing Research Facility (ACRF) to enable scientists to learn parallel computing and experiment with emerging parallel architectures.
“With the ACRF, we had acquired several parallel computers with different architectures to begin research on parallel computing and many people became interested in learning to program them,” said Messina. “Having real parallel machines was a novelty.”
Four Argonne computing research pioneers — Rusty Lusk, Ross Overbeek, Danny Sorensen, and Jack Dongarra — started using ACRF systems to teach courses on parallel algorithms and programming. They then found themselves doing it over and over due to demand. Many people from the international computational community spent part of their summers at Argonne, where they were exposed to parallel computing for the first time, Messina said.
With ATPESC, Messina wanted to create a similar training program for high-end computational science. After conferring with some Argonne colleagues who agreed it would fill an important gap in HPC training, he pitched the idea to DOE’s Office of Advanced Scientific Computing Research and wrote a proposal for funding that was met with favorable reviews.
Next, Messina created an organizing committee that included Pete Beckman, Richard Coffey, Rusty Lusk, Michael Papka, Katherine Riley, and Rajeev Thakur, all from Argonne. The committee identified computing experts worldwide who would be invited to give lectures on topics including supercomputing architecture trends, mathematical software and numerical algorithms, visualization and data analysis, among others, Messina noted.
In 2013, about 60 participants were chosen for the first ATPESC. While that number has grown slightly over the years, the class size remains steady as a way to encourage substantial interactions between the students and the lecturers. It has paid off as participants find that they are able to ask in-depth questions and lecturers provide equally comprehensive responses. In some cases, attendees work side-by-side with lecturers to apply new tools to their research applications.
“I remember having a participant from a DOE national laboratory, with 20 years of experience in computing, tell me at the end of the course that he had learned many useful things, was glad he attended, and he would encourage others from his lab to apply for future editions,” Messina said.
He was thrilled to learn that two graduate students who participated in the first ATPESC had papers accepted at the following year’s SC (Supercomputing Conference), the premier conference in this field and one with a low acceptance rate for papers. They credited having attended ATPESC as a catalyst for their successful submission.
“I also remember running into former students at various conferences, national labs, and universities where I was giving lectures and learning that many had received excellent jobs pursuing their research topics,” Messina said.
Changes come to ATPESC
When Messina was named director of DOE’s Exascale Computing Project in the fall of 2015, Marta García Martínez became the next ATPESC director. A principal project specialist in Argonne’s Computational Science division, she continued as director through 2019.
During her time with ATPESC, García Martínez worked to enhance the planning and execution of the program, including making a venue change in 2017. Over the years, a cascade of improvements to everything from logistics to the curriculum has continued to refresh the program to maximize the experience of all attendees.
“Despite the intensity of the program, the participants are always brimming with ideas and anticipating how they will apply what they’ve learned to their research,” García Martínez said.
Besides accumulating knowledge and advice from the experts, she also noted the value of giving attendees an opportunity to get to know the lecturers and their ATPESC peers.
“The participants have really valued the hands-on component, which allows them to have one-on-one interactions with lecturers to discuss their research challenges and to brainstorm solutions,” said García Martínez.
“It is difficult to evaluate the impact of the program over time, but we know that our participants are facing their scientific challenges with more energy, new ideas, and increased knowledge,” she added.
Many past attendees have reached out to the ATPESC organizers to convey how the program has impacted their career paths.
Elizabeth M. Lee-Rausch, head of the Computational AeroSciences Branch of NASA’s Langley Research Center, for example, sent a thank-you note to García Martínez on behalf of one of her staff members, Chip Jackson, who attended ATPESC in 2019.
“(Chip) has expressed to me how valuable the sessions were during this year’s program and how it will help him in advancing his contributions to the NASA projects he is involved with,” wrote Lee-Rausch. “Our organization recognizes the importance of workforce development in the critical area of high performance computing, and we are grateful that several of our young researchers have been able to participate in your program in recent years. We are already realizing a tremendous payoff from this investment.”
Looking to the future
Raymond Loy has since taken the reins as ATPESC director.
Involved in various roles with ATPESC since it started, Loy found it fulfilling to help participants during the hands-on sessions in the early years of the program. Today, he enjoys crossing paths with participants later in their careers, some who have become faculty members and sent their own students to ATPESC.
A few ATPESC alumni have served as instructors in recent years, including Suyash Tandon, a software system design engineer at AMD, and Max Katz of NVIDIA, both of whom led ATPESC sessions in 2021.
“When I was a grad student, ATPESC provided me a thorough introduction to a number of new topics in high performance computing. It also helped me network with a number of leaders in the field who I wouldn’t otherwise have had the opportunity to meet and have a conversation with,” Katz said. “I am excited that I have been able to help give back to the program by being an instructor, and I am delighted that ATPESC is still going in its tenth year.”
Loy has tried to anticipate the future of the rapidly changing HPC world, if not keep up with it, in his execution of ATPESC. For example, the program increased training related to graphics processing units (GPUs), a technology that has become an integral part of many of the world’s top supercomputers, including DOE’s upcoming exascale systems. ATPESC attendees already have some experience using large-scale computers, so Loy’s goal is to make them more efficient at using powerful systems to conduct computational science and engineering research.
Perhaps one of his biggest challenges as the new director has been adjusting the program around the COVID-19 pandemic, which forced the organizers to switch gears to a virtual format in both 2020 and 2021.
“It was surely a trial by fire as the shutdown began,” Loy said. “While doing all the normal work for the in-person event, we had to decide on and set up a framework for conducting the event virtually.”
Loy and his team decided to reduce the length of each day from 10‑12 hours to 8 hours. This strategy considered the virtual format’s need to accommodate time zones of participants attending from California to Finland. They also were concerned about videoconference fatigue.
Despite the switch to an online event, ATPESC still had numerous applications each year. While the virtual format was well received, the 2022 program is expected to welcome back in-person learning.
After 10 years, ATPESC continues to have an impact on the growing HPC community. To keep ahead of the innovative technology looming above the horizon, Loy and his team are reviewing the entire curriculum with an eye towards larger changes in the future.
“It’s critical that our curriculum reflects the latest HPC trends and technologies to ensure ATPESC continues to be a leading force in training the next generation of supercomputer users,” Loy said.
ATPESC is funded by the Exascale Computing Project, a collaborative effort of the DOE Office of Science’s Advanced Scientific Computing Research Program and the National Nuclear Security Administration.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.