Building the DOE Systems Biology Knowledgebase

Rick Stevens
Seminar

This presentation will step the audience through the development of the DOE Systems Biology Knowledgebase (KBase), a large-scale development project led by Argonne, Berkeley, Brookhaven and Oak Ridge National Laboratories, and includes participation by Cold Spring Harbor Laboratory and multiple university partners. Started in 2011, the KBase project is building the first multi domain systems biology knowledge base aimed at advancing predictive biology in microbes, microbial communities and plants. The KBase project is integrating data from many existing sources, building tools and services that will support complex workflows enabling modeling of microbes, reconciling experimental data with computational predictions, and providing a large-number of computational services that go beyond existing integrated biological databases. KBase is deployed on a purpose-built infrastructure spanning multiple laboratories that collectively house multiple petabytes of data, and that will support scalable computing resources on both cloud and cluster environments. End users can access many thousands of public genomes and related datasets for microbes. They have access to tens of thousands of metagenomic samples and dozens of plant genomes and phenotype datasets. In addition to providing web and programmatic interfaces to these data, the KBase enables users to upload their own private data and virtually integrate it with the public datasets for comparative analysis and development of models. The KBase is aiming to enable collaborative workflows and multiple ways of sharing. The KBase development team is integrating resources such as MicrobeOnline, The SEED, RAST, Model SEED, MG-RAST and other systems into a coherent user-oriented computing environment with a unified API.