Argonne team demonstrates rapid cross-facility data processing

science
Argonne researchers Hannah Parraga (far right), Michael Prince (second from right) and Nicholas Schwarz (third from right) lead a demo at the SC23 conference on using integrated computing resources to accelerate discoveries at the Advanced Photon Source. (Image by Argonne National Laboratory)

Argonne researchers Hannah Parraga (far right), Michael Prince (second from right) and Nicholas Schwarz (third from right) lead a demo at the SC23 conference on using integrated computing resources to accelerate discoveries at the Advanced Photon Source. (Image by Argonne National Laboratory)

The team’s research leverages a fully automated pipeline between the ALCF and the Advanced Photon Source to enable experiment-time data analysis.

As the volume of data generated by large-scale experiments continues to grow, the need for rapid data analysis capabilities is becoming increasingly critical to new discoveries. 

At the U.S. Department of Energy’s (DOE) Argonne National Laboratory, the co-location of the Argonne Leadership Computing Facility (ALCF) and the Advanced Photon Source (APS) provides an ideal proving ground for developing and testing methods to closely integrate supercomputers and experiments for near-real-time data analysis.

For over a decade, the ALCF and APS, both DOE Office of Science user facilities, have been collaborating to build the infrastructure for integrated ALCF-APS research, including work to develop workflow management tools and enable secure access to on-demand computing. In 2023, the team deployed a fully automated pipeline that uses ALCF resources to rapidly process data obtained from the X-ray experiments at the APS. 

To demonstrate the capabilities of the pipeline, Argonne researchers carried out a study focused on a technique called Laue microdiffraction, which is employed at the APS and other light sources to analyze materials with crystalline structures. The team used the ALCF’s Polaris supercomputer to reconstruct data obtained from an APS experiment, returning reconstructed scans to the APS within 15 minutes of them being sent to the ALCF.

The researchers detailed their efforts in their article “Demonstrating Cross-Facility Data Processing at Scale With Laue Microdiffraction,” which was recognized with the Best Paper Award at the 5th Annual Workshop on Extreme-Scale Experiment-in-the-Loop Computing (XLOOP 2023) at the Supercomputing 2023 (SC23) conference in November. Led by APS software engineer Michael Prince, the team includes Doğa Gürsoy, Dina Sheyfer, Ryan Chard, Benoit Côtê, Hannah Paraga, Barbara Frosik, Jon Tischler and Nicholas Schwarz.  

Infrastructure for next-generation experimental research

Providing immediate motivation for the work is the ongoing APS upgrade project (APS-U), which will provide X-ray beams up to 500 times more powerful than before. With a corresponding surge in the amounts of experimental data generated, processing and analyzing the results quickly will necessitate the use of high-performance computing (HPC) systems.

“After the APS Upgrade is complete, we anticipate that several techniques will require HPC to be successfully carried out, and we've been gearing up for this demand,” Prince said.

To this end, Schwarz — principal computer scientist at the APS — noted that the beamline technique introduced in the team’s paper will allow users to collect data about 10 times faster than is possible today.

“Ultimately we're building a common infrastructure that enables the APS, and potentially other facilities, to use ALCF supercomputing resources, and more broadly strengthen the connection between large-scale supercomputing centers and laboratories for experimental and observational science,” Schwarz said. “This means that we are now able to connect instruments to supercomputing centers that previously were never able to take advantage of large-scale HPC resources. That is, we can now incorporate and use supercomputing centers as an integral part of the experiment loop.”

As such, the work has benefitted from and is contributing to Argonne’s greater Nexus effort. Under Nexus, Argonne researchers are working to advance DOE’s vision to build an integrated research infrastructure that seamlessly connects DOE experimental facilities with its world-class supercomputing resources.

“Our automated pipeline for processing APS data using ALCF resources was built with infrastructure and tools being deployed between the two facilities as part of Nexus,” Prince explained. “Our work also involved parallelizing the underlying reconstruction algorithm via MPI and CUDA implementations to better use Polaris hardware and shorten run times. We were then able to conduct a series of demonstrations at scale, using up to 50 Polaris nodes simultaneously. The system was capable of keeping up with the data generation rate, processing scans as they came in every one to two minutes over the course of six- to 12-hour runs.”

Broad ramifications across disciplines

The team’s results carry implications for future software development, engineering and beamline science.

“Through these demonstrations and construction of this software, we learned a lot of lessons on how to develop and deploy these kinds of workflows going forward. It was also a good opportunity to exercise the infrastructure in and between the ALCF and APS. As for the Laue microdiffraction technique itself, the parallelization, optimization and deployment onto Polaris has enabled full-scale analysis of microdiffraction data,” Prince said.

“The work is a good proof of concept for how large-scale processing can be done between the ALCF and APS,” added Chard, co-author and an Argonne computer scientist. “Additionally, we were able to exercise much of the infrastructure that we expect we'll need to use to handle post APS-U processing.”

Furthermore, Prince said that the full-scale reconstructions produced with Polaris were subsequently useful in improving the underlying beamline technique.

“The results have been used to inform how we want to build the next iteration of apertures and the reconstruction algorithm itself. In fact, we're currently using some of the infrastructure we built to run reprocessing on a new iteration of the reconstruction algorithm and parameters,” he said.

The research team is currently at work building user-facing web portals so that beamline scientists and facility users can easily manage their data and launch reconstructions without having to dig into code and scripts.

“We also want to make the pipeline system more general and fault-tolerant for production use. Once the new reconstruction algorithm is validated, we'll need to do another optimization pass to bring run times down, hopefully faster than what we were using during the demonstrations,” Prince said. “We aim to have these tools ready by the time the upgraded APS is fully online and experiments begin. We also hope what we've developed and the lessons we've learned along the way can help guide future processing techniques between the ALCF and APS.”