The PHIA project is a multi-country population-based HIV Impact Assessment survey which has interviewed and tested for HIV over 450,000 people of all ages in Africa. We are also currently conducting a second round of surveys in many countries, and hope to use best practices in big data management to generate a combined dataset across all countries. We want to combine this data with environmental, mobility and social media data and then use machine learning to identify trends in HIV incidence, treatment disruption and risk factors. We would also be interested in looking at other ways to use environmental data to predict potential zoonotic outbreaks.

Large multi-country dataset from population based surveys, hoping to apply machine learning and will seek a big data R01.

Outcome

collaboration on a big data project

Learning opportunity

how to apply machine learning to datasets containing data on over 450,000 people

This is an UNPAID research project.

Faculty Advisor

  • Professor: Andrea Low
  • Department/School: ICAP at Columbia
  • Location: CUMC

Project Timeline

  • Anticipated workload: a few hours for proposal inputs, more if get funds
  • Duration: there will be more this summer and then in the fall

Candidate requirements

  • Skills required: machine learning
  • Additional domain knowledge: the datasets are primarily from HIV surveys, so some knowledge on this would be preferred
  • Student eligibility: freshman, sophomore, junior, senior, master’s