Complex microbiomes play an important role in numerous fields. One critical challenge in their data analysis is to separate true biological data from contamination. Contemporary experimental procedures include negative controls from various sources, but their analysis is complicated by “well-to-well” contamination: contamination that associates with the position of samples during experimental procedures. This causes bacteria sampled from a true biological source to appear in nearby control samples, and vice versa. An analytic approach that accounts for this source of contamination would have a far-reaching impact on the field.

We recently developed a scalable deconvolution algorithm (FEAST) that unveils the latent structure of a given microbiome by modeling it as a convex combination of potential observed and unobserved source samples. While FEAST estimates the contribution of various contamination sources to each biological sample, it does not utilize the fact that different biological samples share the same contamination sources, nor their spatial positioning.

In this project, we will extend FEAST to incorporate the spatial structure of the plate used for DNA sequencing to account for sources of contamination. This will allow us to detect currently undetectable contamination that might skew the results of microbiome studies. We will validate our method on simulated and biological data.

This project is eligible for a matching fund stipend from the Data Science Institute. This not a guarantee of payment, and the total amount is subject to available funding.

Faculty Advisor

  • Professor: Tal Korem
  • Department/School: Systems Biology
  • Location: PH18-200
  • We develop analytic approaches and algorithm to analyze data from the human microbiome, the collection of bacteria that live in and on our bodies. We use these algorithms to pursue clinical questions where microbiome analysis could practically benefit patient care.

Project Timeline

  • Earliest starting date: 10/15/2020
  • End date:
  • Number of hours per week of research expected during Fall 2020: ~12

Candidate requirements

  • Skill sets: Master’s students, seniors, and exceptional juniors prefered. Sound theoretical knowledge of latent variable models and mixture models as well as experience in implementing such models is required. Experience with R, or strong motivation to learn it, is required.
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: Yes