Hierarchical Non-parametric Bayesian Mixture Models and Applications on Big Data

Halid Yerebaken
Seminar

Abstract:
In a wide variety of domains, from text mining to life sciences, there is an abundance of raw data with unknown structure. The unsupervised learning area of Machine Learning focuses on the methods that can discover these structures without additional label information. The traditional methods in this area assume fixed structures and fit models to data based on these assumptions. However, assuming a specific structure will limit the ability of machine learning algorithms to adapt to rare or emerging patterns. Bayesian non-parametric models offer a great flexibility to discover the hidden structure of data beyond these limits. However, scalability of these algorithms has been a concern in recent years. In this presentation, we will focus on sampling-based inference mechanisms of the Bayesian non-parametric models and demonstrate how to scale them using modern parallel architecture.

Speaker Biography:
Halid Z. Yerebakan is a Ph.D. candidate in the Computer Sciences Department at Purdue University and a research assistant in the Computer and Information Science Department at IUPUI.  His research focuses on hierarchical nonparametric Bayesian models and their applications in clustering and text mining. Halid is currently developing efficient parallel stochastic sampling algorithms by exploiting conditional independence structure of these models. He interned at Siemens Healthcare, Dow Agrosciences, and Bashpole Software during his Ph.D.