In this project we’ll be expanding on the existing family of supervised topic models. These models extend LDA to document collections where, for each document, we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as author information, to develop better exploratory data tools.

Outcome

Student/s will have the opportunity to learn about latent variable models of text and gain hands on experiences working with them.

Learning opportunity

Through this project student/s will be able to learn about various topic models.

One selected candidate may receive a stipend via the DSI Scholars program. Amount is subject to available funding.

Faculty Advisor

Project Timeline

  • Anticipated workload: 15-20hrs/week for 6 weeks
  • Duration: Summer period: June-August

Candidate requirements

  • Skills required:
    • Programming experiences with Python.
    • We are seeking student/s that have prior exposure to machine learning and natural language processing either through a course and/or research project.
    • Ideal candidate would have prior working experience with topic models.
    • Experience working with large datasets is a definite plus.
  • Additional domain knowledge: NLP, ML, probabilistic modeling (ideally)
  • Student eligibility: freshman, sophomore, junior, senior, master’s