Hayai-Annotation a GUI R-package for an ultra-fast gene annotation system in plants

Event Sponsor: 
Data Science and Learning Seminar
Start Date: 
Jul 11 2018 - 3:00pm
Building/Room: 
Building 240/Room 1407
Location: 
Argonne National Laboratory
Speaker(s): 
Andrea Ghelfi
Speaker(s) Title: 
Kazusa DNA Research Institute
Host: 
Chris Henry

The main targets in plant science and breeding is to understand its biological systems in order to describe patterns of evolution, diversity and also to increase crop productivity and quality through improving bioatic and abiotic stress tolerance. In order to speed up it, it would be critical for molecular biologists and breeders to broadly and accurately understand gene profiles in genomes. Since genome sequencing are becoming faster and cheaper due to the great advanced NGS, even in crop having complex genomes with high ploidy level, a high throughput and specially fast annotation workflow is required.

In this study we propose Hayai-Annotation, a GUI R-package, for an automated, ultra-fast, and accurate gene annotation system for plant species (model and non-model organism). The workflow is based on sequence similarity searches using USEARCH to a database of UniprotKB, taxonomy Embryophytes (plants). Hayai-Annotation makes use of UniprotKB complete set of protein information to provides five levels of annotation: gene name; gene ontology (GO) consisting of three main categories (Biological Process, Molecular Function and Cellular Component); enzyme commission (EC) code; protein existence level; and evidence type.

Hayai-Annotation was used to compare the annotation of five plant species (sweet cherry, peach, strawberry, fig and Arabidopsis), regarding the distribution of genes for each GO term (gene level and parental level), and EC code. We concluded that Hayai-Annotation was an ultra-fast tool to detect differences between particularities of gene prediction methodology like the presence of transposons and retrotransposons in fig. Additionally, we observed an increased number of genes per GO term, in Arabidopsis,compared with remaining studied species, particularly in fast evolving genes. Besides, it was detected an increased number of genes, in sweet cherry and peach compared with strawberry and fig, related with disease resistance. Finally, we may have found a different pattern of defense response between Arabidopsis and the remaining studied species.