The CONCERN project aims to develop models and tools to quantify clinician concern about patient deterioration in the inpatient setting that can be used in early warning scores. We have discovered and validated several measurable ways within the Electronic Health Record (EHR) to measure clinician concern and have demonstrated that our approach identified patients at risk of deterioration earlier than other methods, which focus only on physiological data. One of our approaches is leveraging documentation of certain concepts within narrative text in nursing notes that are consistent with concern about a patient. However, this narrative free text is not easily accessible - it is often mixed together with structured or templated text and varies over note types. The steps to be performed are
Chronic exposure to arsenic (As) in groundwater is a staggering global public health crisis and yet, we lack a complete understanding of the environmental conditions that govern As mobility and toxicity in groundwater and are unable predict groundwater As concentrations with enough confidence to make effective management decisions. The objective of this project is to identify key hydrologic and biogeochemical variables that control groundwater As concentrations and heterogeneity across spatial scales in Southeast Asia and the USA. We then aim to develop clear mechanistic linkages and high-resolution geospatial information that can be used with machine learning to evaluate and predict groundwater As contamination. This project involves the integration of various types of large datasets from remotely-sensed and field-collected measurements (e.g., surface hydrology and topography, groundwater geochemistry, climate, and population density). We are looking for a student to advance the connections between key environmental variables and groundwater As contamination across scales. The student will receive experience and mentorship in cutting-edge research that crosses interdisciplinary fields, and will have the opportunity to lead their own project and acquire analytical skills using creative measures, which can involve remote sensing, geospatial methods, statistics and graphing, machine learning, and predictive modeling.
Advances in genomic technologies have led to the identification of many novel disease-gene associations, allowing medical diagnoses to be more precise and tailored to an individual. However, the high number of variants present in each individual represents a significant challenge for the implementation of genomic medicine. The goal of this project is to enable the identification of novel genes associated with recessive disorders.
The ocean significantly mitigates climate change by absorbing fossil fuel carbon from the atmosphere. Cumulatively since the preindustrial times, the ocean has absorbed 40% of emissions. To understand past changes, diagnose ongoing changes, and to predict the future behavior of the ocean carbon sink, we must understand its spatial and temporal variability. However, the ocean is poorly sampled and so we cannot do this directly from in situ measurements.
Vehicle routing has been extensively studied in optimization problems. With the advance of AI and big data, this project aims to solve vehicle routing problems (VRP) using reinforcement learning.
Air quality is a major crisis globally, leading to about 5 million premature deaths every year. In sub-Saharan Africa, there is little air pollution data available to characterize the problem, and a lack of focus on solutions. Using output from a high spatiotemporal resolution atmospheric chemistry transport model over Africa simulated by Dr. Westervelt and his group, the student will characterize levels of pollution and validate model results by comparing observed data to model output. The student will also analyze results from sensitivity simulations in which sources of air pollution have been artificially “turned off” in the model. Comparison between the two simulations will allow for source attribution of air pollution, which is important for developing satisfactory mitigation strategies to improve air quality.
The goal of this project is to collect anonymized traces from the Columbia network in order to analyze video traffic characteristics during the work/study-from home period. This information will be used for developing various ML-based tools for Quality of Experience (QoE) measurement. We will perform the feature extraction at the collection time itself and use anonymization techniques (e.g., IP address anonymization), to preserve user privacy. Students will analyze/measure encrypted network traffic to provide ground truth for potential RL/ML algorithms for estimating video QoE and identifying device/application (e.g., the start of a video streaming session). These algorithms can serve as a basis for new video adaptation techniques (see for example - https://wimnet.ee.columbia.edu/wimnet-team-wins-3rd-place-in-the-acm-mmsys20-twitch-grand-challenge/)