Data is central to the NYC Department of Health’s mission to protect and promote the health of all New Yorkers. The agency’s many programs often require large scale record linkages that integrate data from individuals across multiple public health data systems and disease registries. We are implementing a Master Person Index (MPI) system in order to centralize, optimize and standardize matching methodology for administrative data across the Department of Health.

Epidemiologists use linked datasets to work on a range of public health issues; disease-specific teams also conduct surveillance for over 90 reportable diseases and conditions using these linkages. In 2015, an intra-agency Matching Work Group found that record linkage, whether conducted for research and evaluation, surveillance, or clinical practice were extremely or prohibitively labor intensive, and sometimes highly duplicative. Through a collaboration between Columbia University and the NYC Department of Health and Mental Hygiene, we are seeking a data scientist with JAVA, machine learning and/or SQL skills along with good writing, administrative and organizational abilities to contribute to the developing infrastructure and modeling of an MPI and data management solution. The intern would work directly with scientists and researchers from both institutions.

The intern would focus their efforts on one or more of the following:

  • enhancements to the user interface
  • write, review, test and run scripts against the MPI and improve MPI services
  • evaluate and optimize machine learning - data matching model
  • improve system documentation and project planning

One selected candidate will receive a stipend via the DSI Scholars program. Amount is subject to available funding.

Faculty Advisor

  • Professor Jeanette Stingone
  • Department/School: Epidemiology/Mailman School of Public Health
  • Location: Work will be conducted at the NYC Department of Health and Mental Hygiene in Long Island City
  • Through an established collaboration with the New York City Department of Health and Mental Hygiene, Dr. Stingone’s research seeks to efficiently use big public health data to advance research in children’s health. Research is conducted throughout the data lifecycle, from generating complex datasets on large populations through data-linkage efforts to application of machine learning algorithms to generate hypotheses within high-dimensional exposure data.

Project timeline

  • Earliest starting date: 03/04/2019
  • End date: 08/23/2019
  • Number of hours per week of research expected during Spring 2019: ~10
  • Number of hours per week of research expected during Summer 2019: ~40

Candidate requirements

  • Skill sets: Programming proficiency in JAVA and SQL; experience with machine learning;
  • Student eligibility (as of Spring 2019): freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: NOT eligible
  • Additional comments: Human Subjects Protection Training will be required at start of position.