The Federal Communications Commission (FCC) and the Census regularly publish data on U.S. Internet availability, performance and use, at granularities from census block to county and state. The project goal is to answer questions based on the available data, such as “How reliable is Internet access?”, “Who is deploying fiber where?”, “Can we predict reliability of different technologies?”, “Can we predict the deployment of fiber?”

Continue reading

When a colorectal cancer has grown through the wall of the colon or rectum and into other adjacent tissues or organs, it is identified as a T4 primary tumor. If there is no evidence of distant metastasis then it is labeled a locally advanced tumor. Such locally advanced tumors account for approximately 5-15 % of new colorectal cancers. Surgery remains the principal treatment modality for patients with locally advanced colorectal cancer. Studies have demonstrated planned en bloc or multivisceral resections rather than intraoperative assessment of margins more likely results in R0 resections leading to better local control and long-term survival. However, the decision-making for a surgeon confronting a T4 colorectal cancer is challenging because surgery related mortality rates after multivisceral resections are reported up to 12%.

Continue reading

The ocean has absorbed the equivalent of 41% of industrial-age fossil carbon emissions. In the future, this rate of this ocean carbon sink will determine how much of mankind’s emissions remain in the atmosphere and drive climate change. To quantify the ocean carbon sink, surface ocean pCO2 must be known, but cannot be measured from satellite; instead it requires direct sampling across the vast and dangerous oceans. Thus, there will never be enough observations to directly estimate the carbon sink as it evolves. Data science can fill this gap by offering robust approaches to extrapolate from sparse observations to full coverage fields given auxiliary data that can be measured remotely.

Continue reading

The goal of this project is to develop and validate a deep neural network that predicts a child’s emotion and cognition. DSI scholars will implement 3D convolutional neural networks on brain imaging data from thousands of children to predict cognitive, emotional, and socio-developmental variables. Statistical evaluation of the model performance will be conducted. The scalable deep neural network analysis will help find brain underpinnings of cognition and emotion.

Continue reading

The project has collected a large set of data (>200GB) from a cryptocurrency block chain. It is developing methods for detecting anomalies in transactions based on newer Social Networks, Graph Analysis and Machine Learning methods. The work involves data cleaning/wrangling and creation and implementation of various algorithms and analyzing the transactions for identifying different set of anomalies and manipulations.

Continue reading

Our goal is use a large pool of homecare data (including structured data, free text clinical notes, and recorded patient-provider phone conversations) to build predictive models that help identify patients at risk for poor outcomes (like hospital admission or falls).

Continue reading

A common challenge for students in heavy proof-based courses is to come up with a long sequence of logical arguments from the problem statement to the final solution. In doing so, they can often skip steps leading to logical leaps or downright incorrect solutions. Ideally the instructor should identify these mis-steps and help students master such proof-based course material. Here we want to take a data-driven approach to address this challenge.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY