Understanding the interaction between human-associated microbial communities and human health is expected to revolutionize healthcare. Recent work found that this interaction is, in part, shaped by genetic differences between otherwise identical species in the microbiome. Detecting this variation, however, is a significant challenge. This project aims to profile microbial genetic variation within and across multiple patients' microbiomes. This will allow us to better compare and interpret this variation in the context of human disease, gaining mechanistic insight into complex human-microbiome interactions.

Continue reading

The goal of the project is twofold: 1) to better understand and further improve the use of low cost air pollution sensors and 2) to analyze and characterize air pollution data in sub-Saharan Africa. Air pollution kills an estimated 700,000 people per year in Africa, but existing air pollution data in Africa is extremely sparse and estimates of the associated mortality are uncertain. Low cost air pollution sensors have the potential to rapidly revolutionize air quality awareness and data availability in data-sparse areas of the world, including sub-Saharan Africa. However, use of low cost sensors requires careful calibration, performance evaluation, and other quality assurance before the data can be fully trusted to the same degree as regulatory-grade monitors. As part of a larger project led by Dr. Westervelt, fine particulate matter (PM2.5) sensors have already been deployed in several African megacities, including Kinshasa, Democratic Republic of Congo; Nairobi, Kenya; Kampala, Uganda; Accra, Ghana, and Lomé, Togo. In Kampala and Accra, sensors are co-located with a regulatory-grade PM2.5 instrument for several months, allowing for a direct comparison between low cost and regulatory-grade PM2.5 measurements, and also allowing for the development of calibration factors.

Continue reading

This project works with a novel corpus of text-based school data to develop a multi-dimensional measure of the degree to which American colleges and universities offer a liberal arts education. We seek a data scientist for various tasks on a project that uses analysis of multiple text corpora to better understand the liberal arts. This is an ongoing three-year project with opportunities for future collaborations, academic publications, and developing and improving existing data science and machine learning skills. Tasks likely include: (1) Using Amazon Web Services to create and maintain cloud-based storage (SQL, S3 buckets) of the project’s expanding library of data. (2) Extracting information (named entities, times, places, books, and so on) from millions of plain-text syllabus records. (3) Merging multiple forms of data into a single dataset. (4) Scraping websites for relevant information (e.g., college course offerings, school rankings). Some pages may include dynamically created content that requires the use of a program such as Selenium.

Continue reading

The spread of COVID-19 has led to unprecedented and ongoing changes to daily life, including shelter-in-place orders, widespread closing of businesses and schools, and work-from-home and school-from-home at previously unknown levels. These changes in behavior are placing extraordinary demands on the Internet. This project will measure the Internet’s ability to meet these demands, including comparing its performance before, during, and after the peak of COVID-19; whether the amount of change varies between areas heavily impacted by COVID-19 and those less impacted; and whether and how large networks adapt. To provide this rich understanding, this project will combine multiple Internet-scale datasets that provide complementary views to investigate how responses to COVID-19 have impacted the Internet and how networks have reacted. Measuring the network impact of COVID-19 will illuminate the Internet’s strengths and weak points and is a crucial step towards improving the Internet’s future resilience in the face of pandemics, natural disasters, large scale conflict, and terrorist attacks.

Continue reading

Traditionally, these types of data are routinely neglected in hand-crafted constitutive models due to the complexity. Instead, descriptors such as void fraction, dislocation density, and other statistical measures of the microstructures are often incorporated into yield surface or hardening rules (e.g. Gurson damage model, critical state plasticity). In this work, we will overcome this technical barrier by using a deep convolutional neural network to deduce low-dimensional descriptors that best describes the physics of the deformation process of polycrystals. With deep Q reinforcement learning to automate the trial-and-error process, we may explore the decision tree with a large number of trials that are impossible to be done manually. This treatment will empower us to discover the underlying mechanics of polycrystals under a variety of pressure, temperature, and loading rates highly relevant to the Air Force applications. While previous work on data-driven models has often focused on complete substitutions of constitutive laws with a data-driven paradigm, I intend to seek the best option representing the hierarchy of material responses, while implementing adversarial attacks to determine hidden weaknesses of existing polycrystal plasticity models as well as the one generated from the ML approaches. I will make use of a collocation Fast Fourier Transformation (FFT) solver to speed up the generations of the material database, digesting microstructural data via descriptors in the non-Euclidean space, Graph-based knowledge abstraction, and adversarial attack.

Continue reading

The human microbiome is associated with different diseases, but the metabolic mechanisms through which it can modulate health are mostly unknown. Understanding these mechanisms is of paramount importance for prevention and treatment. While metagenomics analysis provides associations between microbial presence and specific diseases, metabolomics analysis can highlight metabolic alterations. None of the two, however, can unveil microbiome metabolic mechanisms associated with these detected alterations. In an attempt to fill this knowledge gap, several microbiome metabolic modeling methods were recently developed. An accurate evaluation of the accuracy of such methods in relation to different pathologies and microbiomes was never conducted.

Continue reading

Single cell sequencing has generated unprecedented insight into the cellular complexity of normal and diseased organ. We are interested in using this technique to understand the mechanisms of eye development, disease and regeneration. We also would like to compare the transcriptomic signatures between mouse models and human tissues. This project involves analysis of large amount of data from single cell sequencing. It requires understanding of statistical analysis and proficient programming skills.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY