The growing use of digital technologies in the education system has generated large amounts of data that records educational processes at a granular level. This project aims to leverage large-scale text data and NLP and causal inference techniques to understand the interplay between instructional contexts, students’ day-to-day online communication experience, and systematic inequality in academic achievement. This understanding can help educators create a more inclusive and effective educational environment to promote engagement and sense of belonging for students from marginalized groups, thereby reducing existing inequities in the system.
In this project we’ll be expanding on an existing family of supervised topic models. These models extend LDA to document collections where for each document we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as regulatory discretion, to develop better data modelling tools.
This project will generate polygenic risk score for obesity for ~ 250 subjects using 2 different datasets using existing R and python based tools. The student will also need to be familiar with unix platform. An association of polygenic risk score with eating behaviors will be tested.
I’m currently working, on loan, for NTIA (ntia.gov) on the BEAD (Broadband Equity, Access and Deployment), a roughly $40 billion project to deploy high-speed internet to all or most locations that currently lack access. We have a public and semi-public data set that lists every home and business in the United States, as well as broadband deployments and government grants.The project will answer questions such as: What will it cost to deploy fiber? Where are community anchor institutions located? What locations are already being subsidized? Which locations without service are in high-poverty areas?
We will leverage and extend large language models and ChatGPT or GPT-3 technologies to retrieve, appraise and synthesize clinical evidence for patients and clinicians. Students with strong background in large language models and natural language processing will be preferred. We will be working closely with clinicians to fine tune the methods.
The goal of this project is to develop and mathematically analyze simple models of empirical phenomena observed in deep learning.