Chronic exposure to arsenic (As) in groundwater is a staggering global public health crisis and yet, we lack a complete understanding of the environmental conditions that govern As mobility and toxicity in groundwater and are unable predict groundwater As concentrations with enough confidence to make effective management decisions. The objective of this project is to identify key hydrologic and biogeochemical variables that control groundwater As concentrations and heterogeneity across spatial scales in Southeast Asia and the USA. We then aim to develop clear mechanistic linkages and high-resolution geospatial information that can be used with machine learning to evaluate and predict groundwater As contamination. This project involves the integration of various types of large datasets from remotely-sensed and field-collected measurements (e.g., surface hydrology and topography, groundwater geochemistry, climate, and population density). We are looking for a student to advance the connections between key environmental variables and groundwater As contamination across scales. The student will receive experience and mentorship in cutting-edge research that crosses interdisciplinary fields, and will have the opportunity to lead their own project and acquire analytical skills using creative measures, which can involve remote sensing, geospatial methods, statistics and graphing, machine learning, and predictive modeling.

Continue reading

Advances in genomic technologies have led to the identification of many novel disease-gene associations, allowing medical diagnoses to be more precise and tailored to an individual. However, the high number of variants present in each individual represents a significant challenge for the implementation of genomic medicine. The goal of this project is to enable the identification of novel genes associated with recessive disorders.

Continue reading

Air quality is a major crisis globally, leading to about 5 million premature deaths every year. In sub-Saharan Africa, there is little air pollution data available to characterize the problem, and a lack of focus on solutions. Using output from a high spatiotemporal resolution atmospheric chemistry transport model over Africa simulated by Dr. Westervelt and his group, the student will characterize levels of pollution and validate model results by comparing observed data to model output. The student will also analyze results from sensitivity simulations in which sources of air pollution have been artificially “turned off” in the model. Comparison between the two simulations will allow for source attribution of air pollution, which is important for developing satisfactory mitigation strategies to improve air quality.

Continue reading

We have a position open for a student(s) who is/are interested in working on systems biology projects in bladder and prostate cancer. Specifically, we are looking for students who are well versed in statistical analysis, basic understanding of standard statistical techniques (appied to biology is a plus) and knowledge of R is required. The position will entail supporting post-doctoral members of the lab with computational analyses of different types of biological data in a wide range of projects.

Continue reading

The goal of this project is to study the molecular background of various congenital disorders affecting the cranial nerves, which are important in senses (hearing, vision, smell), facial muscle movements and more. Abnormal cranial nerve development can cause hearing loss, eye-movement disorders, facial weakness, loss of smell, and difficulties with respiration and swallowing. Some individuals may also have other motor, sensory, intellectual, behavioral and social disabilities. These disorders cause significant disability and are caused by genetic variants, often novel variation or de novo. Unfortunately, disorders affecting the 8th cranial nerve or vestibulocochlear nerve (CN VIII), important in hearing and balance, have been largely understudied. As various cranial nerves can be affected together, such as in Moebius syndrome, and as the vestibulocochlear nerve (CN VIII) and facial nerve (CN VII) also share a path in the internal auditory canal, it is likely that these disorders share underlying genes or closely interacting genes. To investigate the genetic architecture of cranial nerve abnormalities we suggest to molecularly investigate an in-house CN VIII cohort and other cranial dysinnervation cohorts. We will study rare genomic variants (both small variant as structural variants) to identify shared molecular pathways and genes amongst individuals with cranial dysinnervation disorders.

Continue reading

Freshwater supply is critical for managing and meeting human and ecological demands. However, while stocks of water in both natural and artificial reservoirs are helpful for increasing availability, droughts and floods, as well as whiplash events affect reliability on these systems, posing grave consequences on water users. This risk is particularly salient in the state of California, where many local communities have been plagued by extreme hydrological events. In this current research, we contribute to California’s Water Data Challenge effort where a diverse group of volunteers convened to form a multi-disciplinary team that addresses the crucial issues of extreme events in California using data science approaches. Members include researchers and professionals who come from a range of backgrounds representing academia and private sectors. We combine a range of publicly available datasets with Machine Learning (ML) techniques to explore predictability of extreme events during California’s water years. More specifically, we use a variety of water districts and showcase how ML prediction models are not only able to predict the flow of water at varying time horizons, they capture uncertainties posed by the climate and human influences.

Continue reading

This project is the first comprehensive examination of African North Americans who crossed one of the U.S.-Canada borders, going either direction, after the Underground Railroad, in the generation alive roughly 1865-1930. It analyzes census and other records to match individuals and families across the decades, despite changes or ambiguities in their names, ages, “color,” birthplace, or other details.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY