Recent advances in genomic technologies have led to the identification of many novel disease-associated genes, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to extract the genetic information from raw data, including The Broad Institute’s GATK, Seven Bridge’s GenomeGraph and Google’s DeepVariant. These workflows can lead to the identification of different genetic variants, raising the risk of missing disease-causing variants when using only one of these methods.

Continue reading

With the explosive growth of medical literature, making sense of medical evidence is harder than ever. The free text form also makes it difficult to perform evidence retrieval of appraisal. There is a great need for tools and methods that can structure and reason over medical evidence. The goal of this project is to develop computational and symbolic methods to extract evidence from PubMed abstracts, integrate it with evidence derived from real world clinical data (or practice-based evidence), and perform automated knowledge discovery and evidence reasoning. We also hope this research can support evidence-based medicine during the COVID-19 pandemic and provide opportunities for students to hone his/her skills on natural language processing, data mining, deep learning, and semantic knowledge engineering. We have solid preliminary results for the students to build upon. An open-source PICO parser that extracts Population, Intervention, Comparison and Outcome information from PubMed abstracts has been developed and published. Current COVID-19 literature has been downloaded from PubMed and been pre-processed. Preliminary analyses are under way to investigate the patterns in the study populations in COVID-19 clinical studies. Our next steps include but are not limited to evidence summarization at the study level and evidence reasoning at the problem/topic level.

Continue reading

This project is the first comprehensive examination of African North Americans who crossed one of the U.S.-Canada borders, going either direction, after the Underground Railroad, in the generation alive roughly 1865-1930. It analyzes census and other records to match individuals and families across the decades, despite changes or ambiguities in their names, ages, “color,” birthplace, or other details.

Continue reading

Retaining walls are structures designed to restrain soil to a slope that it would not naturally keep to. They were used to facilitate the city’s development in hilly areas. Students may be familiar with the Morningside Park retaining wall , present immediately East of the Columbia campus. Retaining walls facing streets are mapped and inspected. Not so retaining walls supporting soil at back yards. Because the NYC Building Department is not aware of the presence of most, they are not inspected and left to decay. A fatal collapse occurred in October 2020.

Continue reading

New York State regulates construction and demolition waste (CDW)—its generation, recycling and reuse—and collects all data on CDW from private waste haulers and transfer stations/recycling facilities. There is no city source of data for CDW. For the city to innovate policy with respect to CDW, which is a source of embedded carbon, by leveraging its capital program to close material loops, generating environmental sustainability and financial sustainability benefits, it is important to understand where CDW goes after the demolition process through the transfer and recycling processes.

Continue reading

Decoding behavioral signifiers for the brain state of vigilance can have far reaching implications for understanding actions and identifying disease. We are using high resolution video recordings of mice as they navigate a maze, but have access to very few pre-determined behavioral signifiers. Several recent publications implemented computer vision to extract a variety of previously unreachable aspects of behavioral analysis, including animal pose estimation and distinguishable internal states. These descriptions allowed for the identification and characterization of dynamics, which then revealed an unprecedented richness to the behaviors that determine decision making. Applying such computational approaches in our maze in the context of behaviors that have been validated to measure choice and memory can reveal dimensions of behavior that predict or even determine psychological constructs like vigilance. DSI scholars would use pose estimation analysis to evaluate behavioral signifiers for choice and memory and relate it to our real time concurrent measures of neural activity and transmitter release. The students would also have opportunity to examine the effect of disease models known to impair performance on our maze task on any identified signifier.

Continue reading

We are conducting a study at the Columbia School of Social Work to examine associations between adolescent mental health and physiological data (collected through smart watches), sociability metrics (e.g., number of texts, amount of time on social media platforms) and self-reported daily stressors (collected through mobile surveys) and use this data to inform the development of a just-in-time adaptive intervention (delivered by smartphone) to reduce mental health problems among adolescents. Just-in-time adaptive interventions aim to provide the right type/amount of support, at the right time, by adapting to an individual’s changing internal and contextual state. We are looking for a student(s) to develop a mobile app (Android and iOS compatible) that has the following capabilities:

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY