Scalable Microbiome Metagenome Assembly and Profiling

Sebastien Boisvert
Seminar

Current-generation DNA sequencers generate billions of reads per run. These machines operate in parallel, but most of the available analysis software tools for the genomics community run as one process with one or more threads and therefore are not matching the sequencing scalability. It is necessary to devise software that scale beyond one process with the help of message passing where numerous processes on many machines collaborate together by passing messages.

Here, we present Ray Meta and Ray Communities and show how they can assemble de novo and profile samples with unmatched scalability. Ray Meta performs de novo assembly by traversing a distributed de Bruijn subgraph while Ray Communities utilizes virtual coloring to label distributed k-mer objects with known sequences after the assembly steps. Ray is built on top of the minimalist RayPlatform framework, which provides various services such as a modular plugin architecture to register handlers onto a distributed state machine (handler tables), a virtual communicator (message aggregation), a virtual processor (user space threads), a virtual message router, and more. Also, RayPlatform provides valuable profiling information to any application using it.