Systems, from the Test Harness to the Warehouse

Eric Pershey
Seminar

To enable individuals to make sense of the billions of zeros, ones and twos of ALCF data, software systems have arisen.  When Eric started working on the Test Harness for Mira, he noticed the mass amount of data that ALCF has and there was no easy way to get it or correlate it, and he was out to fix this.  There are a number of systems that were created directly to fix this problem.  Storm is one of them and is a Python/Django web application that houses a number of programs/systems; the Test Harness, ALCF's first data warehouse, Job Failure Analysis, Job Info, Bill's Dashboard, Machine Time Overlay, Usage Reports, Availability Reports, Validation and much more.  It directly ties into many of our databases and knows how to combine each data source with the others and is an invaluable tool that helped lead us to where we are today.  Today we have the Data Warehouse and it houses almost all of the info required to report to DOE monthly and yearly and is still growing.  There are at least 100 more data sources that we know about that could be ETL'ed (Extract, Transform, Load) into the warehouse to increase our situational awareness, we just need to pick the right ones.