Storage and Networking

Data Storage

The ALCF’s data storage system is used to retain the data generated by simulations and visualizations. Disk storage provides intermediate-term storage for active projects, offering a means to access, analyze, and share simulation results. Tape storage is used to archive data from completed projects.

Disk Storage

The ALCF has Lustre file systems and GPFS file systems for data storage:

Flare:

A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s.

Primary use of Flare is Compute campaign storage. Also see File Systems, Data Sharing, Data Policy, and Data Transfer.

Eagle:

A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform also provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s.

Eagle provides support for campaign storage as well as data sharing via Globus collections. Also see File Systems, Data Sharing, Data Policy, and Data Transfer.

Tape Storage

ALCF computing resources share three 10,000-slot libraries using LTO8 tape technology. The LTO tape drives have built-in hardware compression with compression ratios typically between 1.25:1 and 2:1, depending on the data, giving an effective capacity of approximately 186PB. Also see Data Transfer and HPSS.

Networking

Networking is the fabric that ties all of the ALCF’s computing systems together.

InfiniBand enables communication between system I/O nodes and the various storage systems described above. The production HPC SAN is built upon NVIDIA Mellanox High Data Rate (HDR) InfiniBand hardware. Two 800-port core switches provide the backbone links between eighty edge switches, yielding 1600 total available host ports, each at 200Gbps, in a non-blocking fat-tree topology. The full bisection bandwidth of this fabric is 320Tbps. The HPC SAN is maintained by the NVIDIA Mellanox Unified Fabric Manager (UFM), providing adaptive routing to avoid congestion, as well as the NVIDIA Mellanox Self-Healing Interconnect Enhancement for Intelligent Datacenters (SHIELD) resiliency system for link fault detection and recovery.

When external communications are required, Ethernet is the interconnect of choice. Remote user access, systems maintenance and management, as well as high performance data transfers are all enabled by the local area network (LAN) and wide area network (WAN) Ethernet infrastructure. This connectivity is built upon a combination of Extreme Networks SLX & MLXe routers and NVIDIA Mellanox Ethernet switches.

ALCF systems connect to other research institutions over multiple 100Gbps Ethernet circuits that link to many high-performance research networks, including local and regional networks like the Metropolitan Research and Education Network (MREN), as well as national and international networks like the Energy Sciences Network (ESnet) and Internet2.

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Data Storage

Disk Storage

Flare:

Eagle:

Tape Storage

Networking

Additional Information on Storage

Data Policy

Disk Quota

Transferring Data Using Globus