Website Outdated

October 12, 2023 in Announcements

This website is now out of use. You can find up-to-date information about the DSI Scholars Program on the DSI website.

Developing a Large Database of Medication Ordering Errors

March 18, 2023 in Open 2023

Over 400,000 deaths per year are the result of preventable harm,1 with medication errors contributing to a large proportion of these errors.2 Specifically, errors at the ordering are particularly at risk for causing serious patient harm.3-5

A longitudinal study of instructional contexts, student communication, and educational inequality using natural language processing

February 22, 2023 in Open Spring-Summer 2023, Open Flexible Timeline

The growing use of digital technologies in the education system has generated large amounts of data that records educational processes at a granular level. This project aims to leverage large-scale text data and NLP and causal inference techniques to understand the interplay between instructional contexts, students’ day-to-day online communication experience, and systematic inequality in academic achievement. This understanding can help educators create a more inclusive and effective educational environment to promote engagement and sense of belonging for students from marginalized groups, thereby reducing existing inequities in the system.

Artificial Intelligence and Public Policy

February 22, 2023 in Open 2023, Open Flexible Timeline

In this project we’ll be expanding on an existing family of supervised topic models. These models extend LDA to document collections where for each document we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as regulatory discretion, to develop better data modelling tools.

Association of polygenic risk score for obesity with eating behaviors

February 22, 2023 in Open 2023, Open Flexible Timeline

This project will generate polygenic risk score for obesity for ~ 250 subjects using 2 different datasets using existing R and python based tools. The student will also need to be familiar with unix platform. An association of polygenic risk score with eating behaviors will be tested.

Broadband Equity, Access and Deployment: Analyzing internet for rural areas

February 22, 2023 in Open 2023, Open Flexible Timeline

I’m currently working, on loan, for NTIA (ntia.gov) on the BEAD (Broadband Equity, Access and Deployment), a roughly $40 billion project to deploy high-speed internet to all or most locations that currently lack access. We have a public and semi-public data set that lists every home and business in the United States, as well as broadband deployments and government grants.The project will answer questions such as: What will it cost to deploy fiber? Where are community anchor institutions located? What locations are already being subsidized? Which locations without service are in high-poverty areas?

Call for Student Applications - Spring-Summer 2023

February 22, 2023 in Announcement

Columbia University Data Science Institute is pleased to announce that the Data Science Institute (DSI) and Data For Good Scholars programs for Spring-Summer 2023 are open for application.

The goal of the DSI Scholars Program is to engage Columbia University’s undergraduate and master’s students in data science research with Columbia faculty through a research internship. The program connects students with research projects across Columbia and provides student researchers with an additional learning experience and networking opportunities. Through unique enrichment activities, this program aims to foster a learning and collaborative community in data science at Columbia.

The Data For Good Scholars program connects student volunteers to organizations and individuals working for the social good whose projects have developed a need for data science expertise. As “real world” problems with real world data, these projects are excellent opportunities for students to learn how data science is practiced outside of the university setting and to learn how to work effectively with people for whom data science sits outside of their subject area.

Website Outdated

Developing a Large Database of Medication Ordering Errors

A longitudinal study of instructional contexts, student communication, and educational inequality using natural language processing

Artificial Intelligence and Public Policy

Association of polygenic risk score for obesity with eating behaviors

Broadband Equity, Access and Deployment: Analyzing internet for rural areas

Call for Student Applications - Spring-Summer 2023

Columbia Data Science Institute (DSI) Scholars Program