2020 ALCF Simulation, Data, and Learning Workshop

The ALCF's Simulation, Data, and Learning Workshop is designed to help researchers improve the performance and productivity of simulation, data science, and machine learning applications on ALCF systems. Workshop participants will have the opportunity to:

Work directly with ALCF staff experts during dedicated hands-on sessions
Learn how to use available tools and frameworks to improve productivity
Test and debug codes with exclusive system reservations on ALCF computing resources
Get assistance with Director's Discretionary projects to help prepare for a major allocation award
Improve the performance of existing ALCF projects
Plan ahead for 2021-2022 allocation proposal submissions

REGISTRATION DEADLINES

U.S. Citizens: November 23, 2020
Foreign Nationals: November 13, 2020

Note: Registrants will be reviewed for experience level and will be asked to provide goals for attending.

Workshop Overview

12/8 Day One (10AM-3PM Central Time) will be a hands-on tutorial for introducing distributed data parallel training on ALCF systems. There will be experts on hand as you run through examples from our Git repo. These examples will teach you how to run deep learning training on multi-GPU nodes and on multiple nodes of ThetaGPU, or a multi-CPU system like Theta. There will also be discussion of how to build proper data pipelines to keep your workflows humming.

Introductory on-boarding for ALCF systems
Data parallel training with Tensorflow and Horovod
- Overview
- Hands-on session
Building effective data pipelines to use accelerators effectively

12/9 Day Two (10AM-3PM Central Time) will focus on DeepHyper, a tool for distributed hyperparameter optimization. Again, this will be done via tutorial using examples from our Git repos, with experts walking you through the steps. We will also cover how to identify performance issues using common profilers such as VTune, TAU, and the built-in Tensorflow Profiler.

Running distributed hyper-parameter optimization with DeepHyper
Profiling deep learning frameworks to optimize your workflow
- Using Tensorflow Profiler
- Profiling with TAU and VTune

12/10 Day Three (10AM-3PM Central Time) Now that you have a performant, trained, deep network, Day Three will cover the important topic of how to deploy it at scale in a simulation. Integrating model inference into distributed simulations will be covered using tutorials from our Git repo.

Integrating inference into distributed simulation
Demonstration using example simulations

Visit our GitHub repo for workshop materials.

Agenda

Day 1: Tuesday, December 8	Topic	Speaker(s)
9:30 - 10:00 am (CT)	Attendee check-in
10:00 - 10:10 am (CT)	Welcome	Michael Papka (ALCF Director)
10:10 - 11:10 am (CT)	Intro to SDL Workshop [Video]	Taylor Childers (Argonne)
11:10 am - 1:00 pm (CT)	Distributed Deep Learning [Video 1, Video 2]	Huihuo Zheng, Corey Adams (Argonne)
1:00 - 2:00 pm (CT)	Lunch Break
2:00 - 4:00 pm (CT)	Building Data Pipelines [Video]	Taylor Childers (Argonne)
Day 2: Wednesday, December 9
10:30 - 11:00 am (CT)	Attendee check-in
11:00 am - 1:00 pm (CT)	Distributed Hyper Optimization [Video 1, Video 2, Video 3 ]	Kyle Felker, Misha Salim, Romit Maulik, Sam Foreman (Argonne)
1:00 - 2:00 pm (CT)	Lunch Break
2:00 - 4:00 pm (CT)	Profiling Deep Leaning [Video 1, Video 2]	Huihuo Zheng, Murali Emani, Taylor Childers (Argonne)
Day 3: Thursday, December 10
10:30 - 11:00 am (CT)	Attendee check-in
11:00 am - 1:00 pm (CT)	Integrating Inference into Simulation [Video]	Bethany Lusch, Romit Maulik (Argonne)
1:00 - 2:00 pm (CT)	Lunch Break
2:00 - 4:00 pm (CT)	Hands-on Session
4:00 - 4:30 pm (CT)	Applying for ALCF Allocation Programs [Video]	Katherine Riley (Argonne)
4:30 - 5:00 pm (CT)	Closing Remarks and Wrap-Up/Next Steps	Ray Loy (Argonne)

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: MyALCF

12/08/2020, 10am CT

Workshop Overview

Agenda

Day 1: Tuesday,
December 8

Topic

Speaker(s)

Day 2: Wednesday,
December 9

Day 3: Thursday,
December 10

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: MyALCF

2020 ALCF Simulation, Data, and Learning Workshop

12/08/2020, 10am CT

Workshop Overview

Agenda

Day 1: Tuesday, December 8

Topic

Speaker(s)

Day 2: Wednesday, December 9

Day 3: Thursday, December 10

Day 1: Tuesday,
December 8

Day 2: Wednesday,
December 9

Day 3: Thursday,
December 10