Broadband Funding: Racial and Economic Equity

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

The federal government spends billions of dollars a year supporting rural broadband (internet access), subsidizing build-out in low-density areas that do not have broadband (unserved areas). However, it is not clear whether the rural areas most in need are receiving a fair share of the funding. Using a very large dataset of broadband availability, census data and recent auction results, the project will analyze whether unserved areas with high racial diversity or lower median income are receiving a fair share of funding. Depending on team size, we will also attempt to create a shareable master data set building on OpenStreetMap and other sources that provides key data points for census units.

California's Water Futures: Can We Predict the Future Value of California's Water Amid Fear of Scarcity?

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Water joined gold, oil and other commodities traded on Wall Street, highlighting worries that the life-sustaining natural resource may become scarce across more of the world. In the state of California, the biggest U.S. agriculture market and world’s fifth-largest economy, this challenge is particularly prevalent. Farmers, hedge funds and municipalities are now able to prepare for the risk that future water availability issues can bring in the state of California.

Characterization of spontaneous animal behaviors from 3D pose estimation data

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

A major goal in neuroscience is to understand how neuronal activity gives rise to behavior. With new technologies, it is possible to record the activity of thousands of neurons simultaneously. However, the interpretation of these data depends on a solid understanding of animal behavior.

Comparison of four workflows for structural variants identification

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-gene associations, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to identify structural variants (i.e. relatively large deletions and duplications of genomic sequences). These workflows can lead to the identification of different structural variants, raising the risk of missing disease-causing variants when using only one of those methods.

Comparison of three workflows for the identification of genetic variants

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-associated genes, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to extract the genetic information from raw data, including The Broad Institute’s GATK, Seven Bridge’s GenomeGraph and Google’s DeepVariant. These workflows can lead to the identification of different genetic variants, raising the risk of missing disease-causing variants when using only one of these methods.

COVID-19 Evidence Extraction and Computing

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

With the explosive growth of medical literature, making sense of medical evidence is harder than ever. The free text form also makes it difficult to perform evidence retrieval of appraisal. There is a great need for tools and methods that can structure and reason over medical evidence. The goal of this project is to develop computational and symbolic methods to extract evidence from PubMed abstracts, integrate it with evidence derived from real world clinical data (or practice-based evidence), and perform automated knowledge discovery and evidence reasoning. We also hope this research can support evidence-based medicine during the COVID-19 pandemic and provide opportunities for students to hone his/her skills on natural language processing, data mining, deep learning, and semantic knowledge engineering. We have solid preliminary results for the students to build upon. An open-source PICO parser that extracts Population, Intervention, Comparison and Outcome information from PubMed abstracts has been developed and published. Current COVID-19 literature has been downloaded from PubMed and been pre-processed. Preliminary analyses are under way to investigate the patterns in the study populations in COVID-19 clinical studies. Our next steps include but are not limited to evidence summarization at the study level and evidence reasoning at the problem/topic level.

Decoding the elements of mouse behavior that reveal vigilance during exploration

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Decoding behavioral signifiers for the brain state of vigilance can have far reaching implications for understanding actions and identifying disease. We are using high resolution video recordings of mice as they navigate a maze, but have access to very few pre-determined behavioral signifiers. Several recent publications implemented computer vision to extract a variety of previously unreachable aspects of behavioral analysis, including animal pose estimation and distinguishable internal states. These descriptions allowed for the identification and characterization of dynamics, which then revealed an unprecedented richness to the behaviors that determine decision making. Applying such computational approaches in our maze in the context of behaviors that have been validated to measure choice and memory can reveal dimensions of behavior that predict or even determine psychological constructs like vigilance. DSI scholars would use pose estimation analysis to evaluate behavioral signifiers for choice and memory and relate it to our real time concurrent measures of neural activity and transmitter release. The students would also have opportunity to examine the effect of disease models known to impair performance on our maze task on any identified signifier.

Broadband Funding: Racial and Economic Equity

California's Water Futures: Can We Predict the Future Value of California's Water Amid Fear of Scarcity?

Characterization of spontaneous animal behaviors from 3D pose estimation data

Comparison of four workflows for structural variants identification

Comparison of three workflows for the identification of genetic variants

COVID-19 Evidence Extraction and Computing

Decoding the elements of mouse behavior that reveal vigilance during exploration

Columbia Data Science Institute (DSI) Scholars Program