The objective of this project is to construct linkages across disparate public health data systems using machine learning tools and assess them for bias and equitable representation of subpopulations defined by demographic and socioeconomic factors.

There is a growing emphasis on the potential for data science initiatives to revolutionize medicine and healthcare. There has been less focus on the need to apply data science techniques to the already extensive data collected and maintained by public health agencies in order to transform our ability to monitor, evaluate and improve population health. Our approaches to data linkage must be both accurate and equitable, to ensure that the advances in public health they stimulate benefit all communities. Through this project, we aim to identify how matching algorithms may be optimized to improve performance across diverse populations and ultimately validate the use of these data linkages for epidemiologic research.

The selected student will work onsite at the Department of Health in Queens, NY and be co-mentored by Columbia faculty and scientists at the NYC Department of Health. Tasks may include writing, reviewing and testing scripts within the AI-based matching software, identifying key predictive features to optimize the matching algorithm, summarizing key results and improving system documentation.

Selected candidate(s) will receive a stipend directly from the faculty advisor. Amount is subject to available funding.

Faculty Advisor

  • Professor: Jeanette Stingone
  • Department/School: Epidemiology/Mailman School of Public Health
  • Location: ARB Room 1608
  • Dr. Stingone is an environmental epidemiologist who collaborates closely with the NYC Department of Health. She conducts research that couples data science techniques with existing public health data to investigate research questions related to children’s health in urban settings.

Project Timeline

  • Earliest starting date: 5/4/2020
  • End date: 8/21/2020
  • Number of hours per week of research expected during Spring 2020: ~8
  • Number of hours per week of research expected during Summer 2020: ~35

Candidate requirements

  • Skill sets: JAVA, R, SQL; familiarity with machine learning algorithms
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Additional comments: Student will need to complete Human Subjects Training. Project site is located at the Department of Health in Long Island City, NY.