A longitudinal study of instructional contexts, student communication, and educational inequality using natural language processing

February 22, 2023 in Open Spring-Summer 2023, Open Flexible Timeline

The growing use of digital technologies in the education system has generated large amounts of data that records educational processes at a granular level. This project aims to leverage large-scale text data and NLP and causal inference techniques to understand the interplay between instructional contexts, students’ day-to-day online communication experience, and systematic inequality in academic achievement. This understanding can help educators create a more inclusive and effective educational environment to promote engagement and sense of belonging for students from marginalized groups, thereby reducing existing inequities in the system.

Artificial Intelligence and Public Policy

February 22, 2023 in Open 2023, Open Flexible Timeline

In this project we’ll be expanding on an existing family of supervised topic models. These models extend LDA to document collections where for each document we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as regulatory discretion, to develop better data modelling tools.

Association of polygenic risk score for obesity with eating behaviors

February 22, 2023 in Open 2023, Open Flexible Timeline

This project will generate polygenic risk score for obesity for ~ 250 subjects using 2 different datasets using existing R and python based tools. The student will also need to be familiar with unix platform. An association of polygenic risk score with eating behaviors will be tested.

Broadband Equity, Access and Deployment: Analyzing internet for rural areas

February 22, 2023 in Open 2023, Open Flexible Timeline

I’m currently working, on loan, for NTIA (ntia.gov) on the BEAD (Broadband Equity, Access and Deployment), a roughly $40 billion project to deploy high-speed internet to all or most locations that currently lack access. We have a public and semi-public data set that lists every home and business in the United States, as well as broadband deployments and government grants.The project will answer questions such as: What will it cost to deploy fiber? Where are community anchor institutions located? What locations are already being subsidized? Which locations without service are in high-poverty areas?

Large Language Models for Clinical Evidence Computing

February 22, 2023 in Open 2023, Open Flexible Timeline

We will leverage and extend large language models and ChatGPT or GPT-3 technologies to retrieve, appraise and synthesize clinical evidence for patients and clinicians. Students with strong background in large language models and natural language processing will be preferred. We will be working closely with clinicians to fine tune the methods.