Code-scaling workshop gets research teams ready for prime time

Author: 
Jim Collins

Facebook Twitter LinkedIn Google E-mail Printer-friendly version

Code to model electrons with X-ray pulses now runs five times faster thanks to hands-on collaboration with Argonne Leadership Computing Facility (ALCF) staff at the “Scaling Your Science on Mira” workshop. The ALCF is a U.S. Department of Energy (DOE) Office of Science User Facility.

The facility’s annual scaling workshop, held May 24-26 at Argonne National Laboratory, welcomed both prospective and current users to the laboratory to work directly with ALCF computational scientists and invited experts on testing, debugging, and improving their codes on Mira, the ALCF’s 10-petaflops IBM Blue Gene/Q supercomputer. (The aforementioned code speedup was just one of the event’s success stories; see Highlighted Accomplishments below for more examples).

“We find that this unique opportunity for face-to-face collaboration can help research teams make significant code improvements on the spot,” said Richard Coffey, ALCF Director of User Experience.

One of the workshop’s goals is to help researchers demonstrate code scalability for a future allocation award at ALCF. The two main allocation programs for DOE leadership computing resources – Innovative and Novel Computational Impact on Theory and Experiment (INCITE) and ASCR Leadership Computing Challenge (ALCC) – require project proposals to convey both scientific merit and computational readiness. The ALCF’s scaling workshop is a key resource to many teams seeking to fulfill the latter requirement.

Current ALCF users also attend the annual workshop to learn tips and techniques for maximizing their time on the facility’s systems.

In addition to hands-on sessions, the event featured talks on the IBM Blue Gene/Q architecture, parallel I/O, data analysis, and various high-performance computing tools.

View the workshop agenda and links to the presentation slides.

Highlighted Accomplishments

  • A team from the Missouri University of Science and Technology improved the performance of hyperWENO, a compressible-flow direct numerical simulation solver. They were able to identify and fix an MPI bottleneck in point-to-point communication in their halo exchange, resulting in an immediate 1.3x performance improvement. They also addressed an I/O bottleneck in parallel HDF5 and developed a strategy for OpenMP threading their application—fixes that are expected to enable further performance improvements.
  • An Argonne research team worked to improve a newly developed force decomposition scheme for a hybrid Monte Carlo/molecular dynamics algorithm in the LAMMPS code. This method, used to model the interactions of electrons with X-ray pulses, provides highly accurate results for validating common approximations. The team spent much of the workshop working with ALCF staff to debug and benchmark the new method. For an argon cluster containing 236,033 particles, the new force kernel is about 5x faster than the original decomposition algorithm in LAMMPS.
  • Researchers from an INCITE project led by Princeton University received hands-on assistance to advance their simulations of radiation-dominated accretion flows. ALCF staff and industry experts helped the researchers use the ParaView tool to visualize and analyze data from their simulations. The INCITE team also learned how to improve their code’s I/O performance and how the use of ensemble jobs could improve their simulation throughput.
  • A University of Southern California INCITE research team came to the workshop to improve the performance of their Replica Exchange Molecular Dynamics (REMD) code. Specifically, they were interested in improving the performance of their MD simulations with the optional stress tensor computed. With help from ALCF staff, the researchers identified and addressed an area of the code where cache misses could be minimized. The data structure was changed to avoid cache misses and resulted in the stress tensor calculations performing 6x faster than the original version.
  • A researcher from the United States Geological Survey attended the workshop to improve an Earth science code’s scalability in preparation for an INCITE proposal. The researcher’s application, used to model the continental United States, was having memory-related issues. ALCF staff helped implement an MPI I/O solution to more efficiently use the available memory on a single node.
  • A researcher from Argonne attended the workshop to learn about the various tools available for benchmarking and performance analysis on Mira. ALCF staff introduced the researcher to AutoPerf (a library that collects performance data for applications running on Mira), HPCToolkit (an open-source suite of tools for profile-based performance analysis of applications), and memlog (a tool used to trace memory usage).

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.