Systems, from the Test Harness to the Warehouse

Event Sponsor: 
Leadership Computing Facility Seminar
Start Date: 
Jul 21 2015 - 12:00pm
Building/Room: 
Building 241/Room D173
Location: 
Argonne National Laboratory
Speaker(s): 
Eric Pershey
Speaker(s) Title: 
Argonne National Labortory - LCF (AIG)

To enable individuals to make sense of the billions of zeros, ones and twos of ALCF data, software systems have arisen.  When Eric started working on the Test Harness for Mira, he noticed the mass amount of data that ALCF has and there was no easy way to get it or correlate it, and he was out to fix this.  There are a number of systems that were created directly to fix this problem.  Storm is one of them and is a Python/Django web application that houses a number of programs/systems; the Test Harness, ALCF's first data warehouse, Job Failure Analysis, Job Info, Bill's Dashboard, Machine Time Overlay, Usage Reports, Availability Reports, Validation and much more.  It directly ties into many of our databases and knows how to combine each data source with the others and is an invaluable tool that helped lead us to where we are today.  Today we have the Data Warehouse and it houses almost all of the info required to report to DOE monthly and yearly and is still growing.  There are at least 100 more data sources that we know about that could be ETL'ed (Extract, Transform, Load) into the warehouse to increase our situational awareness, we just need to pick the right ones.

Miscellaneous Information: 

Upcoming Presenters:

Aug 18 - Venkat Vishwanath (GLEAN)
Sept 8  - Vitali Morozov (SKOPE)
Oct 20 - Prasanna Balaprakash (Empirical Performance Tuning via Machine Learning)
Nov 17 - George Rojas (Project Ni)
Dec 15 -  Bill Allcock (RAN)
Jan 19 - Doug Waldron (Data Warehouse)
Feb 16 - Paul Rich (Cobalt)