Since the industrial revolution the atmosphere has continued to warm due to an accumulation of carbon. Terrestrial ecosystems play a crucial role in quelling the effects of climate change by storing atmospheric carbon in biomass and in the soils. In order to inform carbon reduction policy an accurate quantification of land-air carbon fluxes is necessary. To quantify the terrestrial CO2 exchange, direct monitoring of surface carbon fluxes at few locations across the globe provide valuable observations. However, this data is sparse in both space and time, and is thus unable to provide an estimate of the global spatiotemporal changes, as well as rare extreme conditions (droughts, heatwaves). In this project we will first use synthetic data and sample CO2 fluxes from a simulation of the Earth system at observation locations and then use various machine learning algorithms (neural networks, boosting, GANs) to reconstruct the model’s CO2 flux at all locations. We will then evaluate the performance of each method using a suite of regression metrics. Finally, time permitting, we will apply these methods to real observations. This project provides a way of evaluating the performance of machine learning methods as they are used in Earth science.
A highly collaborative project is available in Dr. Alison Taylor’s and Dr. Fatemeh Momen-Heravi’s lab. This project aims to identify molecular changes such as mutations and RNA signature of head and neck cancer in Black/African American and Hispanic minority populations with the goal of identifying novel therapies for cancer patients and reduce health disparities. The project entails analysis of DNA and RNA sequencing data. Basic coding skills are necessary and the student will be mentored by both principal investigators. The prospective candidate should be motivated, a fast learner, and be able to work in a highly collaborative team environment.
Atherosclerosis, a chronic inflammatory disease of the artery wall, is the underlying cause of human coronary heart diseases. Single-cell genomics have catalyzed the revolution in understanding of cellular heterogeneity and dynamics in atherosclerotic vasculature. The goal of the project is to leverage published and our own single-cell genomic data and perform a meta-analysis. Meta-analysis allows integrated analysis of much larger cell numbers and helps resolve the full spectrum of cellular heterogeneity and dynamics in atherosclerotic vessels and facilitate therapeutic translation. The DSI scholar will: (1) use the latest bioinformatic pipeline to integrate the existing scRNA-seq, CITE-seq, and scATAC-seq datasets; (2) analyze the integrated datasets using R/Bioconductor packages (e.g. Seurat); (3) interpret the data using pathway and network analysis. Some relevant workflows are available through the “Resources” page of our lab website at https://hanruizhang.github.io/zhanglab/.
We will further develop a large scale dataset that evaluates gender biases in sentence-level NLP systems. We will then develop training techniques to encourage models to overcome and mitigate gender-based biases.
5G cellular networks will use high-frequency millimeter-wave (mmWave) communication, which promises high data rates and ample spectrum availability. Students working on this project will help conduct a mmWave wireless channel measurement campaign around the COSMOS testbed (www.cosmos-lab.org), a wireless networking testbed located at Columbia stretching between 120th and 136th St. In collaboration with Bell Labs students will be able to use unique, state-of-the-art mmWave equipment to conduct these measurements (see pre-pandemic example in https://wimnet.ee.columbia.edu/wp-content/uploads/2019/08/mmNets2019_COSMOS_28GHz.pdf). The measurements will play an important role in the development of network-level control algorithms, which is the other, more analytical side of this research project.
The CONCERN project aims to develop models and tools to quantify clinician concern about patient deterioration in the inpatient setting that can be used in early warning scores. We have discovered and validated several measurable ways within the Electronic Health Record (EHR) to measure clinician concern and have demonstrated that our approach identified patients at risk of deterioration earlier than other methods, which focus only on physiological data. One of our approaches is leveraging documentation of certain concepts within narrative text in nursing notes that are consistent with concern about a patient. However, this narrative free text is not easily accessible - it is often mixed together with structured or templated text and varies over note types. The steps to be performed are
Chronic exposure to arsenic (As) in groundwater is a staggering global public health crisis and yet, we lack a complete understanding of the environmental conditions that govern As mobility and toxicity in groundwater and are unable predict groundwater As concentrations with enough confidence to make effective management decisions. The objective of this project is to identify key hydrologic and biogeochemical variables that control groundwater As concentrations and heterogeneity across spatial scales in Southeast Asia and the USA. We then aim to develop clear mechanistic linkages and high-resolution geospatial information that can be used with machine learning to evaluate and predict groundwater As contamination. This project involves the integration of various types of large datasets from remotely-sensed and field-collected measurements (e.g., surface hydrology and topography, groundwater geochemistry, climate, and population density). We are looking for a student to advance the connections between key environmental variables and groundwater As contamination across scales. The student will receive experience and mentorship in cutting-edge research that crosses interdisciplinary fields, and will have the opportunity to lead their own project and acquire analytical skills using creative measures, which can involve remote sensing, geospatial methods, statistics and graphing, machine learning, and predictive modeling.