Training current and prospective supercomputer users remains a core practice at the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility at DOE’s Argonne National Laboratory.
Through training events like the annual ALCF Computational Performance Workshop, the facility connects researchers from across the globe with staff and industry experts to help them improve application performance on its powerful supercomputers.
“The main goal of the workshop is to bring researchers together with ALCF staff experts who know how to get the most out of our supercomputers,” says Ray Loy, an ALCF lead for training, debuggers, and math libraries, who helped organized this year’s workshop.
In May, the ALCF workshop returned once again helping more than 50 attendees boost application performance on ALCF systems in preparation for future allocation awards through programs such as DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program and the ASCR Leadership Computing Challenge (ALCC). Participants worked with ALCF and industry professionals through collaborative online sessions to benchmark, debug and optimize their scientific computing codes on the facility’s supercomputers.
One of the key elements of the workshop is the diverse range of topics packed into the course of the three-day event. The topics included overviews of performance tools, portability with OpenMP, and a demonstration on profiling frameworks with TensorFlow and PyTorch.
Richard Loft, former Director of Technology Development at the National Center for Atmospheric Research, attended the workshop to learn how ALCF systems could help support his team’s work in the National Science Foundation-funded EarthWorks project. As part of the project, Loft is working with a group of scientists and software engineers to provide end-to-end workflow portability across the nation’s leadership computing systems to enable scientists to advance weather and climate research.
“The exascale hardware and software landscape is moving very quickly, and it is hard to hit a moving target, particularly so, for a project like EarthWorks with an extensive code base,” says Loft, “so I was hoping that the workshop would provide the detailed ins and outs of using ALCF systems, a high-level sense of this software roadmap that the DOE is on, and a lot of hands-on experience with that software.”
To build on what he learned at the workshop, Loft applied for a Director's Discretionary (DD) allocation on behalf of the Earthworks project after the workshop concluded. The team will now conduct a series of benchmarking and integration tests with the EarthWorks’ global storm-resolving model, mostly targeting ThetaGPU.
Another attendee Maruti Mudunuru, an Earth scientist at DOE’s Pacific Northwest National Laboratory, came to the workshop to learn more about scaling deep learning applications. Mudunuru and his team utilize ALCF resources to develop deep learning models for calibrating multiphysics codes such as PFLOTRAN, SWAT, and GeoDT.
“We are now using the DeepHyper package to develop scalable machine learning models using GPUs for applications in geothermal energy, watershed hydrology, and estimating subsurface permeability for river water intrusion relevant to DOE’s missions,” says Mudunuru.
After attending the workshop, Mudunuru applied for the ALCF GPU Hackathon to learn how his research could benefit from the facility’s new Polaris system. He and his colleagues also plan to build on a DD allocation by performing scalability studies on ALCF’s GPU systems, with the goal of applying for larger allocations through the INCITE and ALCC programs in the coming year.
Slides and videos from the workshop can be found here.
Stay tuned to the ALCF events webpage for details on upcoming facility workshops and training events.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.