ALCF Data Science Program: Proposal Instructions

The ADSP Proposal Process

The ADSP projects will be categorized as either "data science projects", which will have a specific science goal, or "software technology projects", which will be focused on implementation of the specific technology required to support data science, including complex workflows. Projects will need to self identify in the proposal.

Proposals for the ADSP must have a plan for the science or technology to be accomplished, and include a description of what application development will occur throughout the duration of the project.

The forms and instructions attached should include everything needed to submit a proposal. These are, roughly, a simplified version of an INCITE proposal. Please direct any questions to adsp@alcf.anl.gov.

Proposal deadline is Thursday, June 15, 2017, 5:00 PM CST.

Evaluation of Proposals

ALCF, with the help of internal and external experts, will evaluate proposals on the strength of:

  • Potential impact of proposed science or software technology
  • Use of the architectural features of Theta
  • Science status:
    • Current capability of the proposed work including any scaling studies
    • Required development to use an HPC system like Theta
  • Software technology status:
    • Description of the current state of the technology, the scale previously tested and explanation for the fit for the Theta system
  • Data scale readiness (for both data science and the software technology)
    • Description of the data scale requirements and plans to realize the data science and/or technology at these scales.
  • Appropriateness of development team: is it likely that expertise and person-hours proposed are likely to succeed—science goals accomplished or technology implemented
  • Diversity of science domains and algorithms;
  • Prospects as an Aurora application, and intent to use Aurora

Submission

  • Submission deadline: June 15, 2017 at 5:00 PM CST
  • Prepare your proposal using the instructions below
  • Submit as a single PDF document
  • To submit:  Email adsp@alcf.anl.gov with subject:
    • PI_Lastname ADSP Proposal Submission
    • You may resubmit with revisions as needed up until the deadline.

Please direct any questions to adsp@alcf.anl.gov.

Proposal Instructions

Please create your proposal document with a project title, and the section headings noted below:

Section 1: PI and co-PI Information

1a. Principal Investigator (PI) Information

  • Last Name, First Name, Title (Dr., Mr., Ms., etc.)
  • Institution
  • Street address
  • Email address

1b. Co-Principal Investigator (co-PI) Information

For each co-investigator:

  • Last name, first name, title (Dr., Mr., Ms., etc.)
  • Institution
  • Street address
  • Email address

Section 2: Project Summary

2a. Executive Summary

  • Write an executive summary that accurately describes your proposed research and the high-impact scientific advances you will achieve with access to resources at the ALCF.  (1/2 page). 

2b. Benefit to Community

  • Write a description of the benefit your project will provide to the science community (1/2 page).

2c. Impact Statement

  • Provide a two-sentence project summary that can be used to describe the impact of your project to the public (50 words maximum).

2d. Science/Technology Summary

  • Please identify the category: Science/Technology/Both
  • If submitting as a science project: Write a description of the science problem you would like to address in the 2018 time frame. Include research that will need to be completed in the next two years to lead up to this work (1 page).
  • If submitting as a software technology project: Write a description of the technology you would like to develop and deploy in the 2018 time frame. Include any research or development that must be completed to build up to this work (1 page).

2e. Application Summary

  • 2e.i. Application Requirements: Write a list of your application requirements, including languages (C, C++, Fortan, Java, Python, etc.), toolkits and frameworks (Tensorflow, Neon, Torch, Caffe, etc.),libraries, and current parallel method (MPI, OpenMP, SPARK, etc.) (1 page).
  • 2e.ii. Application Description: Write a description of the current application, including methods, parallelization, workflows, I/O, etc. (1 page).
  • 2e.iii. Application Performance: Describe any performance data, such as scaling, of the current application components, including methods,  I/O, workflows, etc. (1/2 page).
  • 2e.iv. Application Development Needed: Consider here how you might use node level parallelism and the memory hierarchy on the KNL nodes. Consider how you might use SSD on each node of Theta for modalities including but not limited to data persistence, data staging, out-of-core accesses (1 page).

Section 3: Estimate of Resources Requested

In this section, please describe the hardware resources required for the planned work. Projects may request time on any of the ALCF resources including Theta, Mira, Cooley, and the Sage Urika-GX system at Argonne's Joint Laboratory for System Evaluation (JLSE).

  • Theta has 3,624 nodes, each with a KNL 64-core processor having up to 16 gigabytes of high-bandwidth in-package memory and 192 gigabytes of DDR4 RAM. Each node has a 128GB node-local SSD storage. The aggregate peak compute speed is 9.65 petaflops. It has a 10 petabytes Lustre parallel file system.
  • Mira is a 10-petaflops IBM Blue Gene/Q system consisting on 48K nodes with a 5D torus interconnect; each node has 16 cores with four hardware threads per core and 16 gigabytes of RAM.
  • Cooley is a visualization and analytics cluster with 126 compute nodes; each node has 12 CPU cores and one NVIDIA Tesla K80 dual-GPU card. The entire Cooley system has a total of 47 terabytes of system RAM and 3 terabytes of GPU RAM.
  • The Sage Urika-GX system at JLSE is a high-performance data analytics platform with support for a typical big data software stack. The system has 25 nodes interconnected using a high-performance Aries interconnect; each node has two 16-core Intel processors, 256 gigabytes of RAM, 800 gigabytes of SSD and 4 terabytes of hard-disk.

3a. Theta Resources:

  • Theta time in core-hours
  • SSD Use
  • Storage in TB
  • Tape archive space in TB
  • Network Requirements
  • Breakdown for how you would use time on Theta to make final preparations for science runs, and for the science runs themselves. Preparations might include final scaling tests, science problem spin-up runs, etc. For the science runs themselves, estimate the total core-hours and break down into separate components/milestones as appropriate. You will have access to computational resources for the two year period. (1/2-1 page).

3b. Mira or Cooley Resources:

  • Identify resource and core-hours
  • Storage space in TB
  • Tape archive space in TB
  • Brief schedule for how you would use that time on Mira and Cooley: scaling tests, development (e.g., algorithms), verification, parameter sweeps, etc. Break this down into milestones as appropriate for your project (1/2 page).

3c. Sage Urika-GX Cluster Resources:

  • Identify resource and core-hours
  • Storage space in TB
  • SSD Use
  • Brief schedule for how you would use that time on the Sage system: scaling tests, development (e.g., algorithms), verification, parameter sweeps, etc. Break this down into milestones as appropriate for your project (1/4 page).

 

Section 4: Other Collaborations

Indicate whether your team, or others you are aware of using the same code base, have projects using other large-scale resources such as NERSC, OLCF, etc.

Section 5: Project Team Members

5a. Names and Levels of Effort

  • List the names and levels of effort (as a percentage of full-time) for all team members you expect to do work on the ADSP project.
  • For each person, include a CV. If you have trouble getting all of the CVs into the PDF proposal document you are submitting, attach them individually to the email you use to submit the proposal, or email adsp@alcf.anl.gov for assistance.
  • We may fund projects, in part, with postdoctoral scholars working on methods in data science as well as on data centric runtimes/infrastructures. The postdoctoral scholars will be shared with other ADSP projects. Please identify the tasks you would like the postdocs to contribute to your project if you are allocated one. Also, how these tasks would be accomplished if your project isn’t allocated a postdoc.