Data Policy


ALCF Data Confidentiality

The Argonne Leadership Computing Facility (ALCF) network is an open-research network. Because our resources and networks are open to many users and cannot be protected at a partitioned level, we cannot guarantee complete security for any data that resides here. It is up to users to provide the security they need.

The basic level of protection provided is UNIX file level permissions; it is the user's responsibility to ensure that file permissions and umasks are set to match their needs.

NOTE: The default permissions and umasks are group and world readable. For help determining or setting file permissions or umasks, or creating a UNIX group, contact support@alcf.anl.gov.

ALCF Staff with Root Privileges

ALCF resource administrators with root privileges are not constrained by the file permissions, and they have the capability to open and/or copy all files on the system. They can also assume a user’s identity on the system.

ALCF staff will never copy, expose, discuss, or in any other way communicate your project information to anyone outside of your project, the ALCF, or Argonne National Laboratory cybersecurity officials without your explicit permission unless required to do so by law. It is your responsibility to encrypt data if you wish to prevent its exposure under those circumstances.

Administrators will use these elevated privileges only on certain highly restricted machines and, generally speaking, they do so only when requested, or if a suspected problem/security issue exists. Following are instances where ALCF staff might look at your files:

  • We maintain copies of all .error, .output, and Cobalt log files and may review them to determine if a job failure was due to user error or a system failure.
  • If you request our assistance via any mechanism (for example, help ticket, direct personal email, in person, etc.), we interpret that request to be explicit permission to view your files if we think doing so will aid us in resolving your issue.

Use of Proprietary/Licensed Software

All software used on ALCF computers must be appropriately acquired and used according to the appropriate licensing. Possession or use of illegally copied software is prohibited. Likewise, users shall not copy copyrighted software, except as permitted by the owner of the copyright. Currently, the use of export-controlled codes is prohibited.

Prohibited Data

The ALCF computer systems are operated as research systems and contain only data related to scientific research. Use of ALCF resources to store, manipulate, or remotely access any sensitive or national security information is prohibited. This includes, but is not limited to, personally identifiable information (data that falls under the Privacy Act of 1974, 5 U.S.C. 552a), classified information, unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), the design or development of nuclear, biological, or chemical weapons, or any weapons of mass destruction. The use of ALCF resources for personal or non-work-related activities is also prohibited.

Export Control

All principal investigators using ALCF resources and ALCF staff members working with project teams are responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact the ALCF Support Team at support@alcf.anl.gov.

Data Storage Systems

Data stored for any length of time on ALCF resources should only be data directly related to work done on any of the ALCF leadership computing systems. Specific policies apply to the three types of data storage systems maintained at ALCF. Read these policies carefully and plan accordingly in terms of space, usage, and data protection.

Home File System Space

The home file system is intended to hold your executable files, configuration files, etc. It is NOT meant to hold the output from your application runs (use the data/parallel file system for that purpose). The home file system space is generally moderate in size and is the best protected. Because of its size, backups are practical to accomplish. There are two forms of backup. The system performs nightly snapshots of your home directory tree, allowing you to easily recover accidentally deleted files or previous versions of files by simply using the cp command. Snapshots for Mira, Cetus, and Cooley can be found at /gpfs/mira-home/.snapshots/<dayofweek>/<username>. For Vesta, they are located at /gpfs/vesta-home/.snapshots/<dayofweek>/<username>. Please note that these snapshots are stored on the same filesystem and do not serve as protection from disk failure. The system also performs tape backups, enabling the recovery of files more than seven days old or recovery from a catastrophic disk failure. The table below indicates the capabilities and characteristics of each file system. Due to data replication being enabled for the /home file system, usable capacity will be half of the enforced quota limit.

Data/Parallel File System Space

The data/parallel file system is intended primarily for results output from your application. Consider this space intermediate-term storage. Once any active production and/or analysis is complete and you no longer need regular access to the data, archive it within the ALCF (explained below) or transfer it to your home institution. This space has redundancy in the servers and storage but is so large that replication, snapshots, and backups are not practical. The table below indicates the capabilities and characteristics of each file system.

Capacity and Retention Policies

The archive space is intended for offline storage of results you wish to retain but either have no immediate need to access or no room in your parallel file system space. Archiving capabilities are available via HPSS. The primary HPSS access is via HSI. HTAR is available, but its path length and file size limitations often cause it to fail. Globus Online and GridFTP are clients that can also be used with HPSS.  Due to the possibility of data corruption or loss due to a bad tape, users can request dual writes for particularly critical data. Such requests will be handled on a case-by-case basis.

Disk Policies

 

Vesta

Mira/Cetus/Cooley

             Theta

/home

/projects

/home

/projects

/home /projects

Default Quota1

50 GB

500 GB

100GB

1000GB

100GB 1000GB

Quota Enforcement2

hard/soft

hard/soft

hard/soft

hard/soft

hard/soft hard/soft

Disk Redundancy3

dual parity

dual parity

dual parity dual parity

File Server Snapshots6 
(frequency/retained)

daily/7 none daily/7 none daily/7 none

File Server Redundancy

yes

yes

yes yes

File Server Metadata Replication4

yes

yes

yes yes

File Server Data Replication5

no

yes

no

yes no

 

Tape Policies

 

Vesta

Mira/Cetus/Cooley

             Theta

/home

/projects

/home

/projects

/home /projects

Automatic Backup to Tape?7

yes

no

yes

no

yes no

How Long on Disk After Project Completion?

12 months

3 months

12 months

3 months

12 months 3 months

Archived to Tape Before Deleted from Disk?8

yes

no

yes

no

yes no

How Long on Tape After Project Completion?9

24 months

24 months

 
24months 24 months

 

  1. While quotas are subject to negotiation on a case-by-case basis, disk space is a finite resource and projects must exercise good data management practices for their own sake and the sake of other users of the facility.
  2. “Hard quota enforcement” means a job will fail when writing output if you exceed the hard quota limit.  "Soft quota enforcement" means you may exceed the soft quota limit (but never the higher hard quota value) for up to seven days.  If you do not drop back below the soft quota limit within seven days, writes will begin to fail.
  3. Hard drives are in redundancy groups of 10 disks (8 data + 2 parity). In other words, three out of 10 drives would have to fail before data loss occurred.
  4. Metadata (i.e., information listing which blocks are part of which files) is written twice to two different storage arrays. Thus, even if an entire array were lost, the metadata would be preserved.
  5. Refers to the fact that data (user output) is written twice with each block on two different storage arrays, so that even if an entire array were lost, the data would be preserved.
  6. Snapshots are stored in your home directory (see Home File System Space for more info). If you accidentally delete the directory or need a previous version, use the cp command to copy the file back to your home directory.
  7. “Yes” denotes that ALCF does regular backups without intervention from the user. In other cases, user is responsible for archiving the data to HPSS or copying it to another facility as desired.
  8. Users who wish to retain data must archive or transfer their data elsewhere at the end of the project.
  9. Regardless of how the data was archived to tape—ALCF automatic backup or user’s manual archiving—after this time the data is eligible for deletion and storage space may be reclaimed.