Complex genome rearrangements in a unicellular model organism

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

The Landweber Lab is looking for a computational student to work with us to analyze long-read DNA sequence datasets from Oxford Nanopore and PacBio (so-called third generation sequencing platforms). These datasets were collected across a time-course while single cells of the genus Oxytricha are undergoing RNA-guided natural genome editing. This process leads to a completely different “output” product genome from the precursor “input” or germline genome, and has been compared to a cellular computer. The goal will be to capture and classify long reads in these DNA datasets that represent the intermediate steps in genome rearrangements, when chromosomes mix and match hundreds of thousands of precursor building blocks to assemble a mature genome of 18,000 new chromosomes during programmed nuclear development.

COVID-19 Evidence Retrieval and Synthesis Using PubMed Abstracts

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

Evidence-based Medicine (EBM) is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The key difference between evidence-based medicine and traditional medicine is not that EBM considers the evidence while the latter does not, but rather that EBM demands better evidence. Given the exponential growth of the medical literature and the free text format of this big body of literature that hampers efficient evidence computing, researchers, patients and clinicians face significant challenges in evidence retrieval, appraisal, and synthesis. Our long-term goal is to develop natural language processing and text summarization methods to overcome these challenges. Our short-term goal is to build a computable evidence base for COVID-19 and to enable evidence synthesis and reasoning over COVID-19 study findings. Currently we have a database of structured data elements for PubMed abstracts for randomized controlled trials published within the past 20 years. For this scholar project, we expect the participating students to develop methods to analyze the evidence in our evidence base, to build COVID-19 knowledge graphs, and to enable evidence synthesis and appraisal at scale. On this basis, we will also compare the evidence in the literature to the evidence derived from the real world data of COVID-19 patients.

Data For Good: Poverty Research Cartagena

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

This project is an exciting opportunity to work with FEM, a non-profit that aims to address social rights issues among rural communities in Colombia. The team is currently working with the mayor’s office in Cartagena, Colombia to access multiple datasets and data sources to establish who and where the extreme poor in Cartagena live. Her team piloted a smartphone-based collection strategy during IOTA Hurricane and obtained data on 4845 households in a very vulnerable area. The first data science challenge was to wrangle and analyze these data to better characterize the population and their needs. This part of the project was conducted in the Spring and Summer of 2021.

Data For Good: The Cost of Human Rights Violations

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

Rights CoLab is working with the Sustainability Accounting Standards Board (SASB) to define a strengthened set of disclosure standards that investors can use to persuade companies to improve labor rights for both their workforce and workers in supply chains. The project has two components: 1) a data science project, and 2) an Expert Group.

Evaluating the impact of telemedicine on chronic disease outcomes

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

The Center for Behavioral Cardiovascular Health has been at the forefront of developing virtual interventions (e.g., home BP telemonitoring, iHeart enhanced depression screening app, and COVID remote care program) to improve the management of hypertension, depression, and COVID. We are looking for motivated mentees to help us organize and analyze a wide-breadth of patient data from the Columbia-New York Presbyterian data warehouse to understand the impact of the interventions on patient outcomes. The mentee would work alongside highly experienced biostatisticians and our Center’s professionalized data team managers. The mentee will develop expertise in analyzing health system data and will inform decisions to modify, sustain, or de-implement existing programs.

Health Behaviors and Cardiovascular Risk

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

The project will include conducting analyses on lifestyle behaviors, namely diet and sleep, in relation to cardiovascular health. Analyses will be completed within NHANES (National Health and Nutrition Examination Surveys). Analyses will also be completed using data from a 24/7 sleep and activity tracker in relation to blood pressure. The student will contribute to data management and statistical analyses and will be a co-author on manuscripts submitted for publication.

Mapping NYPD Subway Fare Evasion Enforcement

September 1, 2021 in Open Fall 2021, Open Flexible Timeline

This project is the next phase in ongoing research to document how the MTA and NYPD use public resources to criminalize poverty at the subway turnstile, especially in Black and Brown communities.

Complex genome rearrangements in a unicellular model organism

COVID-19 Evidence Retrieval and Synthesis Using PubMed Abstracts

Data For Good: Poverty Research Cartagena

Data For Good: The Cost of Human Rights Violations

Evaluating the impact of telemedicine on chronic disease outcomes

Health Behaviors and Cardiovascular Risk

Mapping NYPD Subway Fare Evasion Enforcement

Columbia Data Science Institute (DSI) Scholars Program