The goal of this project is to evaluate algorithms that predict gene expression directly from sequencing data. With the availability of large scale sequencing data in ADSP and progress made in machine learning methods, it is possible to model long range interactions in the DNA sequence to infer intermediate phenotypes such as gene expression. First, we will test a deep learning based method called Enformer that is able to integrate long-range interactions (such as promoter-enhancer interactions) in the genome and predict gene expression from sequence. Using available RNA-sequencing on a small number of samples (e.g. ROSMAP cohort), we will optimize the algorithm to improve accuracy of prediction. Secondly, inferred expression in the ADSP cohorts will be used to test association with Alzheimer’s Disease and related endophenotypes. Finally, we will incorporate datasets that will become available in future such as cell-type specific ATAC-seq and disease-specific gene expression to re-train learning models to improve gene-expression prediction directly from sequencing data.

Selected candidate(s) can receive a stipend directly from the faculty advisor. This is not a guarantee of payment, and the total amount is subject to available funding.

Faculty Advisor

  • Professor: Badri Vardarajan
  • Center/Lab:
  • Location: 630W, 168th Street, 19th Floor, New York NY-10032
  • Bioinformatics and computational multi-omics analysis in Alzheimer’s Disease

Project Timeline

  • Earliest starting date: 9/15/21
  • End date: 5/31/22
  • Number of hours per week of research expected during Fall 2021: ~20

Candidate requirements

  • Skill sets: Machine Learning, Bioinformatics, Data Science
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: Yes
  • Additional comments: Knowledge of bioinformatics and deep learning is a plus