Understanding the interaction between human-associated microbial communities and human health is expected to revolutionize healthcare. Recent work found that this interaction is, in part, shaped by genetic differences between otherwise identical species in the microbiome. Detecting this variation, however, is a significant challenge. This project aims to profile microbial genetic variation within and across multiple patients' microbiomes. This will allow us to better compare and interpret this variation in the context of human disease, gaining mechanistic insight into complex human-microbiome interactions.

Current methods for characterizing microbiome genetic variation either rely on known reference genome sequences, which limits sensitivity, or use approaches that are reference-free but are highly sensitive to noise and fail to model highly variable regions. We will develop a graphical representation better at encoding genomic variation in its topology. We have preliminary work adapting a previous method for genome assembly in the context of high error rate (Kolmogoroc et al., 2019) to represent microbiome DNA sequencing data; hence, you will not have a cold start on this project. The project goals will be to characterize, design and implement a suitable graph representation that is space and time efficient, utilizing our existing code-base. A successful project will enable large-scale profiling of microbiome genomic variability, which we expect to lead to an exciting publication.

This project is eligible for a matching fund stipend from the Data Science Institute. This not a guarantee of payment, and the total amount is subject to available funding.

Faculty Advisor

  • Professor: Tal Korem
  • Department/School: Systems Biology
  • Location: Presbyterian Hospital, 18-200
  • We develop analytic approaches and algorithm to analyze data from the human microbiome, the collection of bacteria that live in and on our bodies. We use these algorithm to pursue clinical questions where microbiome analysis could practically benefit patient care.

Project Timeline

  • Earliest starting date: 10/1/2020
  • End date:
  • Number of hours per week of research expected during Fall 2020: ~12

Candidate requirements

  • Skill sets: Students should have familiarity with Unix environment, C++, python, and an interest in efficient algorithm design and implementation.
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: No