Getting Started on ThetaGPU

Help Desk

Getting Started on ThetaGPU

In this webinar, we will cover three main topics to help researchers get started with ThetaGPU: (1) compiling and running, (2) profiling and performance analysis, and (3) AI and frameworks.Topic 1: Compiling and RunningThe focus of this session is to lay out all the necessary information for new users of the ThetaGPU supercomputer (a NVIDIA DGX A100 machine), from environment setup to compilation to job execution of a simulation and/or machine learning code. We will provide an overview of the hardware and pre-installed software libraries, including NVIDIA A100 GPUs, compute/service/login nodes, Cobalt job scheduler, and environment module system.Topic 2: Profiling and Performance ToolsThis session will coverNVIDIA's Nsight Systems and Nsight Compute tools. Nsight Systems provides developers a system-wide visualization of an application's performance. Developers can optimize bottlenecks to scale efficiently across any number or size of CPUs and GPUs on ThetaGPU. Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool. Step-by-step guides for Nsight Systems and Nsight Compute will be presented with a quick demo on ThetaGPU.Topic 3: AI and FrameworksPart 3 willsummarize the available software for ThetaGPU for machine learning, including conda, containers, scaling software, and performance tools.

Kyle Gerard Felker, JaeHyuk Kwack, Corey Adams