The growing use of digital technologies in the education system has generated large amounts of data that records educational processes at a granular level. This project aims to leverage large-scale text data and NLP and causal inference techniques to understand the interplay between instructional contexts, students’ day-to-day online communication experience, and systematic inequality in academic achievement. This understanding can help educators create a more inclusive and effective educational environment to promote engagement and sense of belonging for students from marginalized groups, thereby reducing existing inequities in the system.

Specifically, we will be working with multiple years of online discussion forum data across a few minority-serving institutions in the US, which covers millions of posts, hundreds of thousands of students, and thousands of courses. These digital records can also be linked to detailed administrative records. We will combine cutting-edge NLP techniques and existing education research to develop and validate generalizable measures/representations of important linguistic patterns in students’ writing, peer interaction, discussion prompts and instructional policies. Based on these computational representations, we will use causal inference methods to estimate the degree to which instructional contexts shape socio-demographic disparities in students’ communication experience and contribute to academic gaps over time. The end goal of this project is twofold. First, we will produce a handful of scholarly papers that “open the black box” of educational inequality. Second, we will incorporate our computational methods into future field interventions (in planning) that help our under-resourced institutional partners leverage data analytics for equity-oriented decision making.

This is an UNPAID research project.

Faculty Advisor

  • Professor: Renzhe Yu
  • Center/Lab: Human Development
  • Location: Grace Dodge Hall, 525 W 120th St

Project Timeline

  • Earliest starting date: 4/1/2023
  • End date:
  • Number of hours per week of research expected during Spring-Summer 2023: ~10

Candidate requirements

  • Skill sets: Qualified students should be skilled in NLP with machine learning (Pytorch, Hugging Face, etc.), statistical methods (causal inference is a plus), and have an interest in computational social science. The scholar will work with the research team to contribute to all aspects of the project and lead additional analyses.
  • Student eligibility: freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Academic Credit Possible: No