Identifying disease subtypes by comorbidity clustering using structured data from electronic health records
A major challenge to implementing precision medicine arises from patients who share a clinical diagnosis but have different biological causes of disease. Disease subtypes that arise from obscure etiological heterogeneity create inefficiencies in healthcare and attenuate power in clinical trials and research studies. The ability to stratify patients into biologically homogenous subgroups improves the potential for translational research by allowing us to design more powerful studies.
Hidradenitis suppurativa (HS) is a chronic, stigmatizing, and incapacitating skin disease. Patients have a high burden of comorbidities. We hypothesize that sets of comorbidities that tend to present together in individual patients can be used to identify biologically relevant disease subtypes. We have a dataset that contains diagnosis codes for a cohort of 122,880 research participants in which we have identified HS cases, controls and unclassified subjects. We aim to identify ICD codes that are enriched among cases, reduce collinearity among ICD codes, and then perform unsupervised clustering to identify sets of diseases that tend to present together among cases. A replication cohort is available.
One selected candidate will receive a stipend via the DSI Scholars program. Amount is subject to available funding.
Faculty Advisor
- Professor: Lynn Petukhova
- Department/School: Dermatology/P&S Epidemiology/MSPH
- Location: Russ Berrie 303A
- The overall goal of our research program is to use information in the human genome to improve the care of patients who suffer from inflammatory skin diseases. We use clinical and genetic data to define genetic architecture and to identify disease subtypes.
Project Timeline
- Earliest starting date: 3/1/2020
- End date: 8/1/2020
- Number of hours per week of research expected during Spring 2020: ~5
- Number of hours per week of research expected during Summer 2020: ~10
Candidate requirements
- Skill sets: experience with unsupervised clustering methods
- Student eligibility:
freshman,sophomore,junior, senior, master’s - International students on F1 or J1 visa: eligible
- Additional comments: Experience using unsupervised clustering methods is required.