New York Presbyterian/Columbia University Irving Medical Center (NYP/CUIMC) serves a high number of racial/ethnic minority and low-income patients. In this project, we will create a data repository of all patients who have completed a universal screen in a clinical encounter for social determinants of health, including food insecurity. The scholar will handle large datasets extracted from the medical record for database creation and data visualizations. The dataset will include patient demographics, food security, and clinical outcomes. This data resource will allow the scholar to partner with researchers to examine predictors of food insecurity, clinical courses, and health outcomes among a large population of patients, including a time period prior to the COVID-19 surge in New York City. The project will be co-mentored through the members of the University-wide Food Systems Network, a novel collaboration of researchers at the Medical Center, Earth Institute, SIPA, and Teacher’s College.

Continue reading

Memory is a basic function of our brain that enables us to use the past experiences to service the present and future on a daily base, and memory function is often disrupted in neurological and psychiatric diseases, such as Alzheimer’s disease and posttraumatic stress disorder. To understand the molecular mechanism of memory storage, we will focus on DNA methylation, a chemical modification of our genome, that is hypothesized to play a critical role for memory. We have identified thousands of DNA methylation changes at numerous genomic loci occurred during the formation of fear and reward memory in the mouse brain. We will develop new computational tools to analyze these changes of DNA methylation and search for the common sequence features of these genomic loci. The result of this project will lead to a systematic understanding of the principle on the function and regulation of DNA methylation in memory, and will pave the way to develop new therapeutic strategies for diseases involved memory defects.

Continue reading

Complex microbial communities play an important role in numerous fields, from human health to bioremediation. One critical challenge in their data analysis is to separate true biological data from contamination of various sources. While contemporary experimental procedures include various negative controls, a comprehensive statistical approach for their analysis has not been developed. Such a framework would have a far-reaching impact on the field.

Continue reading

Complex microbiomes play an important role in numerous fields. One critical challenge in their data analysis is to separate true biological data from contamination. Contemporary experimental procedures include negative controls from various sources, but their analysis is complicated by “well-to-well” contamination: contamination that associates with the position of samples during experimental procedures. This causes bacteria sampled from a true biological source to appear in nearby control samples, and vice versa. An analytic approach that accounts for this source of contamination would have a far-reaching impact on the field.

Continue reading

Our lab is interested in aneuploidy, or the incorrect number of whole chromosomes and chromosome arms. A challenge in this area of research is that karyotypes require a large number of proliferating cells for analysis. To address this, our lab and collaborators developed new algorithms to identify aneuploidy alterations from DNA sequencing data. Here, the project goal is to implement these algorithms at Columbia, and subsequently to apply these analysis methods to samples generated in the lab and patient samples. Building on this, the DSI student may also develop new algorithms for use with single-cell sequencing data and RNA sequencing data. Experience in one or more of the following is a must: UNIX, R, and python. The DSI student will be mentored by Dr. Alison Taylor, and he/she will also work closely with all lab members.

Continue reading

This project will focus on the identification of genetic factors involved in various forms of hereditary diseases, including neurodevelopmental disorders, hearing loss, skeletal disorders and more. Some of these children endure years-long diagnostic odysseys of trial-and-error testing with inconclusive results and misdirected treatments. We are dedicated to track down their molecular causes by integrating various “-omics” technologies, including genomics, transcriptomics and epigenomics.

Continue reading

The goal of the project is twofold: 1) to better understand and further improve the use of low cost air pollution sensors and 2) to analyze and characterize air pollution data in sub-Saharan Africa. Air pollution kills an estimated 700,000 people per year in Africa, but existing air pollution data in Africa is extremely sparse and estimates of the associated mortality are uncertain. Low cost air pollution sensors have the potential to rapidly revolutionize air quality awareness and data availability in data-sparse areas of the world, including sub-Saharan Africa. However, use of low cost sensors requires careful calibration, performance evaluation, and other quality assurance before the data can be fully trusted to the same degree as regulatory-grade monitors. As part of a larger project led by Dr. Westervelt, fine particulate matter (PM2.5) sensors have already been deployed in several African megacities, including Kinshasa, Democratic Republic of Congo; Nairobi, Kenya; Kampala, Uganda; Accra, Ghana, and Lomé, Togo. In Kampala and Accra, sensors are co-located with a regulatory-grade PM2.5 instrument for several months, allowing for a direct comparison between low cost and regulatory-grade PM2.5 measurements, and also allowing for the development of calibration factors.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY