Designing Fair Representations with Provable Guarantees
Designing high quality prediction models while maintaining social equity (in terms of ethnicity, gender, age, etc.) is critical in today’s world. Most recent research in algorithmic fairness focuses on developing fair machine learning algorithms such as fair classification, fair regression, or fair clustering. Nevertheless, it can sometimes be more useful to simply preprocess the data so as to “remove” sensitive information from the input feature space, thus minimizing potential discrimination in subsequent prediction tasks. We call this a “fair representation” of the data. A key advantage of using a fair data representation is that a practitioner can simply run any off-the-shelf algorithm and still maintain social equity without having to worry about it.
The main appeal of fair representations is that it obviates the need to trust users of the data; the fairness is essentially guaranteed by the representation itself. While this is an appealing property, very little work has been done to properly quantify such guarantees. The goal of this project is to formalize such fairness guarantees just from the given training data. By analyzing information-theoretic relationships between the data representation and the sensitive attribute, we can quantify the social equity level of any data representation from finite samples. This work would not only help provide a “fairness certificate” for any given data representation, but also would help consumers gain more trust in the social equity of ML models.
This is project is NOT accepting applications.
- Professor: Nakul Verma
- Department/School: Computer Science
- Location: CEPSR 726
- Earliest starting date: 10/15/2019
- End date: 12/31/2019
- Number of hours per week of research expected during Fall 2019: ~8
- Skill sets:
- Algorithmic Fairness
- PAC learning
- Information Theory
- Student eligibility:
freshman, sophomore, junior, senior, master’s
- International students on F1 or J1 visa: eligible