Multi-class probabilistic clustering of the human gut microbiome
Understanding the structure and function of the human gut microbiome is expected to revolutionize healthcare due to its many associations with human disease. A critical step in microbiome analysis involves a clustering stage, where genomic sequences of unknown origin are assigned to latent genomes present in the sample. Current clustering methods rely on mixture-models, yet these fail to correctly model the features of genomic sequences shared across multiple genomes. These sequences are of great importance, often encoding antibiotic resistance genes that drive resistant outbreaks. This project’s goal is to develop a clustering algorithm that will effectively cluster both shared and unique genomic sequences. We have developed two probabilistic models, both based around hierarchical Poisson factorization, that have already produced promising results. The project’s goal will be to refine these models: This will involve robustly evaluating the current models, determining their limitations, and designing new models that improve upon the current. A successful project will enable for the first time, scalable, and comprehensive reconstruction of bacterial genomes. In turn, this will enable a large-scale analysis of antimicrobial resistance in the context of the human gut microbiome. We anticipate a successful project to result in an exciting publication.
This project is eligible for a matching fund stipend from the Data Science Institute. This is not a guarantee of payment, and the total amount is subject to available funding.
Faculty Advisor
- Professor: Tal Korem
- Center/Lab: Korem Lab
- Location: Presbyterian Hospital 18-200
- Our research develops data analysis methods for multi-omic microbiome data. We focus on integrating clinical, microbiome, lifestyle and environmental data in a way that advances from statistical associations to actionable insights that can be used in clinical practice.
Project Timeline
- Earliest starting date: 9/7/21
- End date: 5/1/22
- Number of hours per week of research expected during Fall 2021: ~12
Candidate requirements
- Skill sets: Required: Knowledge of programming, probability Ideal: Knowledge/experience with Bayesian statistics, graphical models
- Student eligibility:
freshman,sophomore, junior, senior, master’s - International students on F1 or J1 visa: eligible
- Academic Credit Possible: Yes