Swift: Parallel Scripting for Petascale Data Analysis

Mike Wilde
Seminar

Swift is a system for the rapid and reliable specification, execution, and management of large-scale science and engineering problems. It supports applications that execute large numbers of tasks that pass data via files - as is common, for example, when analyzing large quantities of data or performing parameter studies and ensemble simulations.

The open source Swift software combines a simple scripting language to enable the concise, high-level specifications of complex parallel computations, mappers for accessing diverse data formats in a convenient manner, and an execution engine that can manage the dispatch of tasks to environments ranging from clusters to grids to petascale systems such as the Blue Gene/P.

This talk will describe the Swift programming model and show several examples of applying it on large-scale clusters to problems in diverse scientific disciplines.