Designing effective prediction models via kernel random projections

Mixture models are a popular technique for clustering and density estimation due to their simplicity and ease of use. However the success of these models relies crucially on specific assumptions these models make about the underlying data distribution. Gaussian mixture models, for instance, assume that the subpopulations within the data are Gaussians-like, and can thus lead to poor predictions on datasets with more complex intrinsic structures. A common approach in such situations is to resort to more complex data models. An interesting sparsely explored alternative is to find feature transformations that maintain the salient cluster information while simplifying the subpopulation structure, in effect making mixture models highly effective.

The goal of this project is to explore such transformations. Previous literature indicates that random projections can make data more Gaussian-like. By analyzing kernelized versions random projections, one can potentially leverage this property to simplify complex data and design fast and effective clustering and density estimation techniques.

This is project is NOT accepting applications.

Faculty Advisor

Professor: Nakul Verma
Department/School: Computer Science
Location: CEPSR 726

Project Timeline

Earliest starting date: 10/15/2019
End date: 12/31/2019
Number of hours per week of research expected during Fall 2019: ~10

Candidate requirements

Skill sets:
- Metric Embeddings
- Random Projections
- Clustering
- Density Estimation
Student eligibility: ~~freshman~~, ~~sophomore~~, ~~junior~~, senior, master’s
International students on F1 or J1 visa: eligible

Designing effective prediction models via kernel random projections

Faculty Advisor

Project Timeline

Candidate requirements

Columbia Data Science Institute (DSI) Scholars Program