Mixture models are a popular technique for clustering and density estimation due to their simplicity and ease of use. However the success of these models relies crucially on specific assumptions these models make about the underlying data distribution. Gaussian mixture models, for instance, assume that the subpopulations within the data are Gaussians-like, and can thus lead to poor predictions on datasets with more complex intrinsic structures. A common approach in such situations is to resort to more complex data models. An interesting sparsely explored alternative is to find feature transformations that maintain the salient cluster information while simplifying the subpopulation structure, in effect making mixture models highly effective.

The goal of this project is to explore such transformations. Previous literature indicates that random projections can make data more Gaussian-like. By analyzing kernelized versions random projections, one can potentially leverage this property to simplify complex data and design fast and effective clustering and density estimation techniques.

This is project is NOT accepting applications.

Faculty Advisor

  • Professor: Nakul Verma
  • Department/School: Computer Science
  • Location: CEPSR 726

Project Timeline

  • Earliest starting date: 10/15/2019
  • End date: 12/31/2019
  • Number of hours per week of research expected during Fall 2019: ~10

Candidate requirements

  • Skill sets:
    • Metric Embeddings
    • Random Projections
    • Clustering
    • Density Estimation
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible