Biological Sequence Annotation by Machine Learning Techniques

Event Sponsor: 
Computing, Environment and Life Sciences Seminar
Start Date: 
May 18 2017 - 10:00am
Building 240/Room 1404
Argonne National Laboratory
M. Volkan Atalay
Tom Brettin

Our research group has been developing and applying machine learning techniques for the annotation of biological sequences. The main approach that we have developed called Subsequence Profile Map (SPMap), is both generative and discriminative and it is based on feature space mapping. We initially used SPMap to predict the functions of proteins from their sequences. Instead of focusing on function specific motifs, SPMap considers all of the subsequences as a distribution over a quantized space by discretizing and reducing the dimension of an otherwise huge space of all possible subsequences. Therefore, it assigns new functions to the sequences with missing annotations. We formulated the function prediction problem as a classification problem defined on Gene Ontology (GO) terms. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. In addition to SPMap, we have devised and implemented BLAST k-nearest neighbor (BLAST-kNN) and peptide statistics combined with SVMs (PEPSTATS-SVM). We applied these two additional methods and their weighted combinations and built a system called GOPred for the sequence annotation. Results show that combining different methods improves prediction accuracy in majority of cases. We have extended the application of GOPred to a largescale protein database, UniProt ( We applied the same methodology for the classification of enzyme classes for a given enzyme sequence. Furthermore, a novel method is currently being studied to predict whether the input protein sequence is an enzyme or non-enzyme. We are also investigating the use of deep learning for again function prediction in addition to virtual screening for target detection in drug development. Both the subsequence profile map and deep learning can be applied to antibiotic resistance.
M. Volkan Atalay is a professor of Computer Engineering at the Middle East Technical University  (METU), Ankara, Turkey. He obtained “Diplôme de Docteur dans la spécialité Informatique” (Ph.D in computer science) from Université Paris Descartes, Paris, France. During sabbatical leave, he spent a year (2004) at Virginia Bioinformatics Institute, Virginia Tech, VA, USA. From 2010 to 2016, he was the Vice President for research of METU and Chairman of Board of Directors of ODTU TEKNOKENT (METU Technopolis). His main responsibilities included strategies and policies for research and for university-industry partnership, relations with both public and private institutions, and corporate strategy management.

His research interest lies in the area of machine learning in bioinformatics. He is also involved in activities related to technology based innovation, technology based entrepreneurship, technology management, and research policies and strategies.