Argonne’s pioneering computing program pivots to exascale

science

When it comes to the breadth and range of the U.S. Department of Energy’s (DOE) Argonne National Laboratory’s contributions to the field of high-performance computing (HPC), few if any other organizations come close. Argonne has been building advanced parallel computing environments and tools since the 1970s. Today, the laboratory serves as both an expertise center and a world-renowned source of cutting-edge computing resources used by researchers to tackle the most pressing challenges in science and engineering.

Since its digital automatic computer days in the early 1950s, Argonne has been interested in designing and developing algorithms and mathematical software for scientific purposes, such as the Argonne Subroutine Library in the 1960s and the so-called ​“PACKs” - e.g., EISPACK, LINPACK, MINPACK and FUNPACK - as well as Basic Linear Algebra Subprograms (BLAS) in the 1970s. In the 1980s, Argonne established a parallel computing program – nearly a decade before computational science was explicitly recognized as the new paradigm for scientific investigation and the government inaugurated the first major federal program to develop the hardware, software and workforce needed to solve ​“grand challenge” problems.

A place for experimenting and community building

By the late 1980s, the Argonne Computing Research Facility (ACRF) housed as many as 10 radically different parallel computer designs – nearly every emerging parallel architecture – on which applied mathematicians and computer scientists could explore algorithm interaction, program portability and parallel programming tools and languages. By 1987, Argonne was hosting a regular series of hands-on training courses on ACRF systems for attendees from universities, industry and research labs.

In 1992, at DOE’s request, the laboratory acquired an IBM SP – the first scalable, parallel system to offer multiple levels of input/output (I/O) capability essential for increasingly complex scientific applications – and, with that system, embarked on a new focus on experimental production machines. Argonne’s High-Performance Computing Research Center (1992–1997) focused on production-oriented parallel computing for grand challenges in addition to computer science and emphasized collaborative research with computational scientists. By 1997, Argonne’s supercomputing center was recognized by the DOE as one of the nation’s four high-end resource providers.

Becoming a leadership computing center

In 2002, Argonne established the Laboratory Computing Resource Center and in 2004 formed the Blue Gene Consortium with IBM and other national laboratories to design, evaluate and develop code for a series of massively parallel computers. The laboratory installed a 5-teraflop IBM Blue Gene/L in 2005, a prototype and proving ground for what in 2006 would become the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science User Facility. Along with another leadership computing facility at Oak Ridge National Laboratory, the ALCF was chartered to operate some of the fastest supercomputing resources in the world dedicated to scientific discovery.

In 2007, the ALCF installed a 100-teraflop Blue Gene/P and began to support projects under the Innovative and Novel Computational Impact on Theory and Experiment program. In 2008, ALCF’s 557-teraflop IBM Blue Gene/P, Intrepid, was named the fastest supercomputer in the world for open science (and third fastest machine overall) on the TOP500 list and, in 2009, entered production operation. Intrepid also topped the first Graph 500 list in 2010 and again in 2011. In 2012, ALCF’s 10-petaflop IBM Blue Gene/Q, Mira, ranked third on the June TOP500 list and entered production operation in 2013.

Next on the horizon: exascale

Argonne is part of a broader community working to achieve a capable exascale computing ecosystem for scientific discoveries. The benefits of exascale computing – computing capability that can achieve at least a billion billion operations per second – is primarily in the applications it will enable. To take advantage of this immense computing power, Argonne researchers are contributing to the emerging convergence of simulation, big data analytics and machine learning across a wide variety of science and engineering domains and disciplines.

In 2016, the laboratory launched an initiative to explore new ways to foster data-driven discoveries, with an eye to growing a new community of HPC users. The ALCF Data Science Program, the first of its kind in the leadership computing space, targets users with ​“big data” science problems and provides time on ALCF resources, staff support and training to improve computational methods across all scientific disciplines.

In 2017, Argonne launched an Intel/Cray machine, Theta, doubling the ALCF’s capacity to do impactful science. The facility currently is operating at the frontier of data-centric and high-performance supercomputing.

Argonne researchers are also getting ready for the ALCF’s future exascale system, Aurora, expected in 2021. Using innovative technologies from Intel and Cray, Aurora will provide over 1,000 petaflops for research and development in three areas: simulation-based computational science; data-centric and data-intensive computing; and learning – including machine learning, deep learning, and other artificial intelligence techniques.

The ALCF has already inaugurated an Early Science Program to prepare key applications and libraries for the innovative architecture. Moreover, ALCF computational scientists and performance engineers are working closely with Argonne’s Mathematics and Computer Science (MCS) division as well as its Computational Science and Data Science and Learning divisions with the aim of advancing the boundaries of HPC technologies ahead of Aurora. (The MCS division is the seedbed for such groundbreaking software as BLAS3, p4, Automatic Differentiation of Fortran Codes (ADIFOR), the PETSc toolkit of parallel computing software, and a version of the Message Passing Interface known as MPICH.)

The ALCF also continues to add new services, helping researchers near and far to manage workflow execution of large experiments and to co-schedule jobs between ALCF systems, thereby extending Argonne’s reach even further as a premier provider of computing and data analysis resources for the scientific research community.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Systems