Our primary objective for this work will be to build a GMR model that can correct for bias in low cost particulate matter (PM2.5) sensors to be used globally. We will select 5-10 diverse reference PM2.5 and low cost PM2.5 co-locations to build a Gaussian Mixture Regression model (GMR). Recently, our team showed that GMR provides a higher quality correction factor for PurpleAir PM2.5 sensors than multiple linear regression and random forest, in terms of both correlation and accuracy. We then plan to evaluate this model on at least 20 independent co-location datasets that the GMR has not seen. There has been an exciting recent rise in commercially available low-cost sensors (LCS), such as PurpleAir (www.purpleair.com) and Clarity sensors, which when paired with machine learning (ML) based correction algorithms demonstrate high accuracy compared to co-located reference grade monitors5,6. So far, these corrections have been limited to the few LCS locations which are co-located with expensive reference-grade monitors, while the potential from the thousands of un-co-located sensors remains untapped. PurpleAir and similar devices have been deployed all over the world. Ideally our global correction factor will allow for the extraction of more trustworthy data from huge open-access databases of air pollution data such as PurpleAir.

Continue reading

The brain is the most complex organ of the body, and hosts a multitude of cell types organized in functionally-specialized brain regions. So far, systematic attempts to describe the complexity of the brain have been limited to a few species, including mouse and the fruit fly. Extending the description of brains to multiple species is essential to identify evolutionarily-conserved principles of brain organization and function.

Continue reading

The wireless revolution is fueling the demand for access to the radio frequency spectrum. Smartphones, wearables, modern cars, and smart homes are all competing for spectrum resources. Managing this increasing demand is an important and timely research challenge. Dynamic Spectrum Allocation (DSA) methods allow multiple wireless networks to collaboratively adapt in real-time to dynamic RF frequency environments. In this project, we consider intelligent wireless networks that exchange Spectrum Consumption Models (SCMs) in order to dynamically coordinate the spectrum usage aiming to avoid harmful interference. Students working on this project will construct SCMs based on real measurements of wireless signals, develop novel frequency coordination protocols based on SCMs, and implement the protocols in a custom-built python simulator and/or in a Software-Defined Radio testbed.

Continue reading

This team project aims to investigate different approaches to presenting individuals with daily suggestions for meeting their nutritional goals. Specifically, we are interested in developing new mechanisms for choosing suggestions that satisfy an individual’s’ preferences and habits (e.g., similar to an individual’s previous meals, based on the analysis of textual meal descriptions), comparing effectiveness of meal recommendations that are expressed as text versus those expressed as images of meals, and developing ways to balance an individual’s preferences with exposing them to new meal ideas.

Continue reading

The project will use Generative Adversarial Network (GAN) to generate space-time correlated renewable generation scenarios. The student will gather historical wind speed and solar radiation data from Texas, and train a GAN to generate scenarios. The student will also investigate the scenario correlation with temperature, and use average temperature as a key feature for scenario generation, and benchmark it with alternative scenario generation approaches. This effort is part of a storage valuation project funded by DSI Seed Grant in 2022, in which these generated scenarios will be used for performing storage valuation.

Continue reading

Discriminatory development policies have systematically relegated certain populations to undesirable locations including low elevation areas at risk of flooding. As the climate changes, many properties will no longer be inhabitable and others, especially houses in floodplains, will suffer damage due to more frequent and significant flooding. Current U.S. federal policy funds flood risk mitigation measures, such as property acquisition, relocation and retrofitting, however depending on various factors at the sub-county level, these actions can have disproportionate benefit to high income areas and not extend to vulnerable populations. We investigate patterns related to potential disproportionate availability and access to government linked programs, exploring different types of climatic factors using flood insurance claims data from NFIP. Work with the intern will build off existing research on programmatic wide and event specific analysis in the Carolinas to explore patterns that may be of interest specifically to state and county level decision makers to evaluate how communities are benefiting from existing programs and to ensure equity. We plan to publish an event specific research article using high resolution data on the distribution of risks and benefits following a major disaster.

Continue reading

Structural variants (SVs) are large genomic alterations which can be implicated in disease. This project will focus on using novel genomic techniques to identify structural variants in genomic cold cases with neurological disorders. These “cold” cases which have previously remained unsolved with standard genomic approaches. We will use optical genome mapping and long read sequencing, together with novel bioinformatic techniques to detect and analyze structural variants.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY