California's Water Futures: Can We Predict the Future Value of California's Water Amid Fear of Scarcity?

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Water joined gold, oil and other commodities traded on Wall Street, highlighting worries that the life-sustaining natural resource may become scarce across more of the world. In the state of California, the biggest U.S. agriculture market and world’s fifth-largest economy, this challenge is particularly prevalent. Farmers, hedge funds and municipalities are now able to prepare for the risk that future water availability issues can bring in the state of California.

Evaluating Machine learning algorithms in Earth science

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline

Since the industrial revolution the atmosphere has continued to warm due to an accumulation of carbon. Terrestrial ecosystems play a crucial role in quelling the effects of climate change by storing atmospheric carbon in biomass and in the soils. In order to inform carbon reduction policy an accurate quantification of land-air carbon fluxes is necessary. To quantify the terrestrial CO2 exchange, direct monitoring of surface carbon fluxes at few locations across the globe provide valuable observations. However, this data is sparse in both space and time, and is thus unable to provide an estimate of the global spatiotemporal changes, as well as rare extreme conditions (droughts, heatwaves). In this project we will first use synthetic data and sample CO2 fluxes from a simulation of the Earth system at observation locations and then use various machine learning algorithms (neural networks, boosting, GANs) to reconstruct the model’s CO2 flux at all locations. We will then evaluate the performance of each method using a suite of regression metrics. Finally, time permitting, we will apply these methods to real observations. This project provides a way of evaluating the performance of machine learning methods as they are used in Earth science.

Using speech and language to identify patients at risk for hospitalizations and emergency department visits in homecare

January 4, 2021 in Closed Spring 2021, Closed Summer 2021, Closed Flexible Timeline 2021

This study is the first step in exploring an emerging and previously understudied data stream - verbal communication between healthcare providers and patients. In partnership between Columbia Engineering, School of Nursing, Amazon, and the largest home healthcare agency in the US, the study will investigate how to use audio-recorded routine communications between patients and nurses to help identify patients at risk of hospitalization or emergency department visits. The study will combine speech recognition, machine learning and natural language processing to achieve its goals.

Data driven approaches to finding undiagnosed HS patients in EHR

September 8, 2020 in Open Projects Fall 2020

Electronic Health Records (EHR) provide a rich integrated source of phenotypic information that allow for automated extraction and recognition of phenotypes from EHR narratives and provide an efficient framework for conducting epidemiological and clinical studies. In addition, when EHR are linked to genetic data in electronic biorepositories such as eMERGE and All of US, phenotype information embedded in EHR can be used to efficiently construct cohorts powered for genetic discoveries. However, limitations arise from repurposing data generated from healthcare processes for research, which can include data sparseness, low quality data and diagnostic errors. Phenotyping algorithms are developed to overcome these limitations providing a robust means to assess case status.

Physically-informed polycrystal plasticity models of beta-HMX

September 8, 2020 in Open Projects Fall 2020

Traditionally, these types of data are routinely neglected in hand-crafted constitutive models due to the complexity. Instead, descriptors such as void fraction, dislocation density, and other statistical measures of the microstructures are often incorporated into yield surface or hardening rules (e.g. Gurson damage model, critical state plasticity). In this work, we will overcome this technical barrier by using a deep convolutional neural network to deduce low-dimensional descriptors that best describes the physics of the deformation process of polycrystals. With deep Q reinforcement learning to automate the trial-and-error process, we may explore the decision tree with a large number of trials that are impossible to be done manually. This treatment will empower us to discover the underlying mechanics of polycrystals under a variety of pressure, temperature, and loading rates highly relevant to the Air Force applications. While previous work on data-driven models has often focused on complete substitutions of constitutive laws with a data-driven paradigm, I intend to seek the best option representing the hierarchy of material responses, while implementing adversarial attacks to determine hidden weaknesses of existing polycrystal plasticity models as well as the one generated from the ML approaches. I will make use of a collocation Fast Fourier Transformation (FFT) solver to speed up the generations of the material database, digesting microstructural data via descriptors in the non-Euclidean space, Graph-based knowledge abstraction, and adversarial attack.

Combining in vivo calcium imaging datasets and deep learning networks for video analysis (DeepLabCut) to identify novel brain regulators of fine motor learning in mice

January 15, 2020 in Project Spring 2020, Project Summer 2020

Our goal is to use deep learning networks to understand which neurons in the brain encode fine motor movements in mice. We collected large datasets entailing calcium imaging data of active neurons and high-resolution videos when mice perform motor tasks. We want to use recent advances in deep learning to (1) estimate the poses of mouse body parts at a high spatiotemporal resolution (2) extract behaviorally-relevant information and (3) align them with neural activity data. Behavioral video analysis is made possible by transfer learning, the ability to take a network that was trained on a task with a large supervised dataset and utilize it on a small supervised dataset. This has been used e.g. in a human pose–estimation algorithm called DeeperCut. Recently, such algorithms were tailored for use in the laboratory in a Python-based toolbox known as DeepLabCut, providing a tool for high-throughput behavioral video analysis.

COSMOS Smart Intersections, Cloud-Connected Vehicles

January 15, 2020 in Project Spring 2020, Project Summer 2020

Research on: (i) COSMOS cloud connected vehicles, (ii) Monitoring of traffic intersections, using bird’s eye cameras, supported by ultra-low latency computational/communications hubs; (iii) Simultaneous video-based tracking of cars and pedestrians, and prediction of movement based on long-term observations of the intersection; (iv) Real-time computational processing, using deep learning, utilizing GPUs, in support of COSMOS applications; (v) Sub-10ms latency communication between all vehicles and the edge cloud computational/communication hub, to be used in support of autonomous vehicle navigation. The research is performed using the pilot node of project COSMOS infrastructure.

California's Water Futures: Can We Predict the Future Value of California's Water Amid Fear of Scarcity?

Evaluating Machine learning algorithms in Earth science

Using speech and language to identify patients at risk for hospitalizations and emergency department visits in homecare

Data driven approaches to finding undiagnosed HS patients in EHR

Physically-informed polycrystal plasticity models of beta-HMX

Combining in vivo calcium imaging datasets and deep learning networks for video analysis (DeepLabCut) to identify novel brain regulators of fine motor learning in mice

COSMOS Smart Intersections, Cloud-Connected Vehicles

Columbia Data Science Institute (DSI) Scholars Program