Experiments in systems research are challenging to reproduce; researchers have to choose between pursuing a fast-paced research agenda and developing well-organized, sufficiently-documented, and easily-reproducible systems. Like incurring fiscal debt, there are often tactical reasons to take on technical debt in experimental systems research—such as deferring documentation, organization, refactoring, and unit tests when pursuing a new idea or meeting a conference deadline. But more often than not this technical debt is never repaid leading to irreproducible experiments. In this talk, we will first present different levels of technical debt and how they are being ascribed to research articles. We will then present Sciunit (http://sciunit.run), an automatic containerization system for conducting reproducible science. Sciunit uses OS-level tracing mechanisms to transparently create application containers, which can be run portably in similar OS environments. Sciunit captures provenance to provide reproducible guarantees, and to help repay different levels of technical debt. We will demonstrate the system on a few data-intensive science use cases and discuss making Sciunit as part of an experiment life-cycle.
Bio: Tanu Malik is an assistant professor in the School of Computing at DePaul University and directs the Data Systems and Optimization Lab. Her research interests span topics in database systems, data provenance, distributed systems, and cyber-infrastructure for scientific data management. Tanu is also a recipient of the 2019 NSF Career Award.
This seminar will be streamed. See details at https://anlpress.cels.anl.gov/cels-seminars/