One of the methods to identify novel gene-disease associations is called “trio analysis”, when an affected child (proband) and both his/her parents DNA are sequenced. The genetic information from the child and his/her parents allows for the rapid identification of compound heterozygous variants (i.e. 2 variants in the same gene, one inherited from the mother and the other one from the father).

In this project, the student will be provided with a large dataset (genetic data) of trios along with several key quantitative and qualitative features. The student will be asked to create a system to identify compound heterozygous variants when performing trio analysis on the cloud. In addition to the preliminary dataset, the student will also be expected to apply their system to a control dataset and compare/contrast the findings in a presentation to CPMG staff members while demonstrating the validity and soundness of the system they developed.

Strong knowledge of python is a requirement for this project, as well as knowledge of data visualization tools and communication skills. The student will be guided in this process by CPMG staff members in the form of weekly meetings; however, the student is expected to be a self-starter and approach the problem with innovative and creative solutions.

This project is eligible for a matching fund stipend from the Data Science Institute. This is not a guarantee of payment, and the total amount is subject to available funding.

Faculty Advisor

  • Professor: Ali Gharavi
  • Center/Lab: Center for Precision Medicine and Genomics
  • Location: CUIMC
  • The mission of the Center for Precision Medicine and Genomics (CPMG) is to improve human health through high quality research, education and clinical care.

Project Timeline

  • Earliest starting date: 3/1/2022
  • End date:
  • Number of hours per week of research expected during Spring/Summer 2022: ~10
  • Number of hours per week of research expected during Summer 2022: ~35

Candidate requirements

  • Skill sets: Fluent in at least one programing language (R, Python, Perl, Java), at least one course in statistics and knowledge in genetics.
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: Yes