Close computation from far away: On-demand analysis fuels frontier science

Laura Wolf, Argonne Leadership Computing Facility

Facebook Twitter LinkedIn Google E-mail Printer-friendly version

Nuclear fusion has long promised to provide a safe and clean source of virtually limitless energy. Developing efficient devices that fulfill that potential, sustainably, demands a scientific and engineering effort that is a major area of plasma physics research today. Researchers at the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy Office of Science User Facility, have helped to accelerate this effort by making available near-real-time data analysis to the experiments conducted to design such devices.

A tokamak is an experimental machine designed to harness the energy of fusion using a powerful magnetic field to confine plasma in the shape of a torus (a doughnut-like shape). Achieving a stable plasma equilibrium requires precise control of the magnetic field lines that wind around the torus in a helical shape.

Scientists at the DIII-D National Fusion Facility, a DOE Office of Science User Facility in San Diego, California, conduct fast-paced plasma physics experiments by creating 6-second shots, or ‘pulses’, of confined plasma every 15-20 minutes. Each new pulse can be planned using analysis of the previous pulse, using a fusion science analysis code called SURFMN running on a local computing resource. However, since the fine-grid analysis is a process that itself takes 20 minutes to complete, every other discharge must proceed without this detailed SURFMN analysis.

Computer scientists at Argonne National Laboratory solved the San Diego team’s timing mismatch problem by automating and shifting the analysis step to Cooley, a powerful data analysis cluster at the ALCF. Although this cluster is located some 2,000 miles away from the experiment, it was able to compute the analysis of every single pulse and return the results to the research team in a fraction of the time required by the computing resource in San Diego.

The ALCF already has the computing and networking infrastructure in place to easily couple itself to an experiment happening in another place. Additional challenges to integrating that infrastructure into a real-time experiment required the addition of a service that waits for the experiment’s data to become available in San Diego, retrieves it, analyzes it, and then sends the result back to the DIII-D scientists to calibrate the next experiment.

The new service leverages code developed for a previous collaboration with the Large Hadron Collider (LHC), a large, high energy physics experiment in Switzerland. In 2015, ALCF computer scientist Thomas Uram and Argonne physicist Taylor Childers used Mira to simulate particle collision events generated by LHC experiments, with an award of fifty million computing hours. To facilitate the flow of jobs between the LHC and the ALCF, the Balsam service was developed to seamlessly interface with LHC systems and execute the workflow on ALCF resources. For DIII-D, Balsam was adapted to interact with the DIII-D systems and to run fusion analyses instead.

Now, a typical SURFMN run takes just under three minutes to complete, using just 21 of Cooley’s 126 compute nodes. The additional computing power also allowed the SURFMN runs to be conducted with higher complexity, using a finer grid for the Fourier analysis, yielding more precise results than they typically obtained on their local systems.

This is the first instance of an automatically triggered, between-shot fusion science analysis code running on-demand at a remotely located high-performance computing resource. For the science team, the computation enables higher resolution analysis to be completed faster, and with improved accuracy. For the team at the ALCF, this collaboration represents a new paradigm of the high-performance computing center as a service facility.

Merging the Machines of Modern Science

The Office of Science’s scientific user facilities offer capabilities unrivaled anywhere in the world. These facilities provide the state-of-the-art instruments, large-scale computers, and measurement equipment needed by the research community to push the frontiers of science. Tens of thousands of researchers from academia, industry, and national laboratories develop and conduct research campaigns using these resources.

Research teams tackle hefty mission-related energy challenges and environmental problems, and help ensure the integrity of the nation’s nuclear weapons. At the same time, industry-based engineers design new airplane wings, university-based biologists design tailor-made drug molecules, and government laboratory-based scientists discover and design new materials.

Work conducted on DOE user facility resources also contribute knowledge to large, international scientific experiments, such as the DIII-D research program operated by private contractor General Atomics for the DOE through the Office of Fusion Energy Sciences.

DIII-D is a large international program with nearly 100 participating institutions and a research team of over 500 users, all working to establish the scientific basis for the optimization of the tokamak approach to fusion energy production. So far, the DIII-D program has helped inform the redesign of ITER, a worldwide fusion effort now under construction in southern France, including the development of the physics basis for key ITER issues and advanced ITER operation.

DOE user facilities are widely distributed, which can make collaborative research involving large resources separated by a wide area network a challenge. However, the need for real-time or near-real-time analysis of experimental results, for example to calibrate an instrument or to focus on a particular area of interest, adds a service layer—the establishment of a virtual “iterative loop” between the facilities, or even instruments.

High-performance computing centers are well-suited to providing on-the-fly data analysis of massive data sets. The ALCF and DIII-D collaboration represents a real-time integration with an active experiment, in this case to explore magnetic confinement of fusion energy at another DOE User Facility, but this service could be extended to large-scale experiments anywhere, such as sky surveys in New Mexico and Chile or large physics experiments in Europe.

The Evolving Leadership Computing Center

Along with enabling scientific applications at the exascale, leadership computing facilities like the ALCF will evolve from a model of primarily providing computing cycles for simulations to a model that also provides big data analytics and machine learning across a wide variety of science and engineering domains and disciplines, including experimental and observational domains.

This advance in computing power will accelerate scientific simulations far beyond what is possible today, while also providing necessary computing resources to large-scale experimental and observational scientific facilities. Integration with the DIII-D facility is an early example of this future.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.