Complex microbial communities play an important role in numerous fields, from human health to bioremediation. One critical challenge in their data analysis is to separate true biological data from contamination of various sources. While contemporary experimental procedures include various negative controls, a comprehensive statistical approach for their analysis has not been developed. Such a framework would have a far-reaching impact on the field.

We have recently developed a scalable deconvolution algorithm (FEAST) that unveils the latent structure of a given microbial community by modeling it as a convex combination of potential observed and unobserved source ecosystems. While FEAST estimates the contribution of various contamination sources to each biological sample, it does not utilize the fact that different biological samples share the same contamination sources.

In this project, we will extend FEAST to utilize additional information gained in common experimental setups, effectively sharing contamination sources between samples. Specifically, we will develop a sequential inference algorithm that will take as input multiple samples from the same microbial environment, as well as potential sources of contaminations, and output the inferred “true” biological contribution of each sample. We will validate the resulting method on simulated and biological data.

This project is eligible for a matching fund stipend from the Data Science Institute. This not a guarantee of payment, and the total amount is subject to available funding.

Faculty Advisor

  • Professor: Tal Korem
  • Department/School: Systems Biology
  • Location: Presbyterian Hospital 18-200
  • We develop analytic approaches and algorithm to analyze data from the human microbiome, the collection of bacteria that live in and on our bodies. We use these algorithm to pursue clinical questions where microbiome analysis could practically benefit patient care.

Project Timeline

  • Earliest starting date: 10/15/2020
  • End date:
  • Number of hours per week of research expected during Fall 2020: ~12

Candidate requirements

  • Skill sets: Master’s students, seniors, and exceptional juniors prefered. Some knowledge of latent variable models and mixture models, as well as experience with R, or strong motivation to learn it, is required.
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: Yes