ALCF workshop prepares researchers for Theta, Mira

science
2017 ALCF Computational Performance Workshop
2017 ALCF Computational Performance Workshop
2017 ALCF Computational Performance Workshop

For most supercomputer users, running science simulations on a leading-edge system for the first time requires more than just a how-to guide.

“There are special tools and techniques you need to know to take full advantage of these massive supercomputers,” said Sean Dettrick, lead computational scientist at Tri Alpha Energy, a California-based company pursuing the development of clean fusion energy technology.

Dettrick was one of the more than 60 researchers who attended the Argonne Leadership Computing Facility’s (ALCF) Computational Performance Workshop from May 2-5, 2017, for guidance on preparing and improving their codes for ALCF supercomputers, including Theta, the facility’s new 9.65 petaflops Intel-Cray system.

Every year, the ALCF hosts an intensive, hands-on workshop to connect both current and prospective users with the experts who know the systems inside out—ALCF computational scientists, performance engineers, data scientists, and visualization experts, as well as invited guests from Intel, Cray, Allinea (now part of ARM), ParaTools (TAU), and Rice University (HPCToolkit). With dedicated access to ALCF computing resources, the workshop provides an opportunity for attendees to work directly with these experts to test, debug, and optimize their applications on leadership-class supercomputers.

“The workshop is designed to help participants take their code performance to a higher level and get them computationally ready to pursue large-scale science projects on our systems,” said Ray Loy, the ALCF’s lead for training, debuggers, and math libraries. “This year, we had the added attraction of Theta, which previously had only been available to users in the Early Science Program.”

Theta will be opened up to the broader user community when it enters production mode on July 1, 2017. The new system will be available to researchers awarded projects through the 2017-2018 ASCR Leadership Computing Challenge (ALCC) and the 2018 Innovative and Novel Computational Impact on Theory and Experiment (INCITE) programs. One of the ALCF workshop’s goals is to help researchers demonstrate code scalability for INCITE and ALCC project proposals, which are required to convey both scientific merit and computational readiness.

For Dettrick and his colleagues, the workshop presented an opportunity to begin preparing for Theta. The team currently has a small Director’s Discretionary project at the ALCF, but they have their sights set on applying for a larger allocation through the INCITE program in the future.

“Our company has an in-house computing cluster that is like training wheels for the large supercomputers available here,” said Dettrick. “By moving some of our modeling work to ALCF systems, our goal is to inform and expedite our experimental research efforts by carrying out larger simulations more quickly.”

Working with ALCF staff members, the Tri Alpha Energy researchers were able to compile and run two plasma simulation codes on Theta. In the process, they worked with an Intel representative to use the Intel VTune performance profiler to identify and address some performance and scalability issues. The ALCF team suggested a number of strategies to improve threading of the codes and reduce I/O time on Theta.

“This experience definitely planted some seeds in my mind about how we can improve productivity moving forward,” Dettrick said.

Mark Kostuk, a mathematical modeler and optimizer from General Atomics’ Magnetic Fusion Energy Division, also brought a plasma code to the workshop to prepare for a future INCITE award. Initially, Kostuk encountered several intermittent run failures on Theta.

He was able to overcome the issue by working with several of the on-site experts. Using the Allinea DDT Debugger, they identified one of the issues—memory errors that were appearing in calls to a math library. The collaborative effort continued into the following week, allowing Kostuk to pinpoint and fix the bug causing the run failures.

“It really worked out great. I received a lot of hands-on help with the code,” Kostuk said. “Once we resolved the issues, I was able to run a significant set of benchmarks and scaling tests as part of our preparations for INCITE.”

In addition to the hands-on sessions, the ALCF workshop featured talks on the facility’s system architectures, performance tools, optimization techniques, and data science capabilities (view the full agenda and presentation slides here).

For Juan Pedro Mendez Granado, a postdoc at Caltech, the workshop provided a crash course in how to take advantage of leadership computing resources to advance his research into lithium-based batteries. Granado, a graduate of the 2016 Argonne Training Program for Extreme-Scale Computing, has been modeling the process of lithiation and delithiation for silicon anodes using a computing cluster that allows him to simulate hundreds of thousands of atoms at a time.

“With the ALCF’s supercomputers, I could simulate millions of atoms over much larger time scales,” he said. “Simulations at this scale would give us a much better understanding of the process at an atomistic level.”

Granado came to the workshop to explore his options for accessing ALCF computing systems. He left with intentions to apply for a Director’s Discretionary award to begin preparing for a more substantial award in the future.

 “Not only does the workshop help participants improve their code performance, it also allows us to bring new researchers into the leadership computing pipeline,” said Loy of the ALCF. “Ultimately, we’re looking to grow the community of researchers who can use our systems for science.”

The ALCF is a U.S. Department of Energy Office of Science User Facility.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Systems