Background: Genomes are inextricably tied to life as we know it, encoding all the molecular information used by organisms. Next-generation DNA sequencing has resulted in the scalable reading of genomes from organisms that inhabit complex environments - rather than being limited to organisms typically studied in the lab. Alongside this, algorithmic development is beginning to reveal the complex biology of genomes.

Continue reading

We are constantly exposed to input from the outside world, but we do not perceive nor remember everything we encounter. The state of our brains right before we receive such sensory inputs influences whether or not we process them. Brain oscillations are proposed to play a key role in setting these brain states; however, how exactly these brain rhythms influence perception and other cognitive processes remains a topic of active research. The Brain Rhythms Lab investigates how brain rhythms gate information through the brain, how they facilitate interactions with the rest of the body, and how these rhythms influence cognitive functions.

Continue reading

Adverse pregnancy outcomes (APOs), such as preeclampsia and preterm birth, are common and devastating. The human and economic costs of APOs are tremendous, and the United States has among the highest APO rates among developed nations. APOs are especially common in non-White and low-income communities. For example, in the United States Black women are 50% more likely to deliver preterm compared to White women. Research has shown that the increased risk of adverse outcomes in overburdened populations is not fully explained by socioeconomic status or other socio-demographic factors. In addition to having elevated risk for adverse outcomes, non-White women in the United States may be less likely to receive certain interventions, such as treatment for postpartum depression, but are more likely to receive others, such as cesarean section, suggesting that there may be unwarranted and discriminatory variation in pregnancy care.

Continue reading

Recent advances in genomic technologies have led to the identification of many novel disease-gene associations, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to identify structural variants (i.e. relatively large deletions and duplications of genomic sequences). These workflows can lead to the identification of different structural variants, raising the risk of missing disease-causing variants when using only one of those methods. Unfortunately, many of the variants identified by those workflows are artifacts (i.e. absent in the biological sample), raising concerns that time and effort will be wasted on those artifacts instead of analyzing the causative genetic variant. The goal of this project is to develop best practices to increase the chance to identify causative structural variants, while reducing the number of artifacts. We will use the raw data from whole-exome and whole-genome sequencing of patients with renal diseases. The students will be expected to (1) Compare the output of 4 different tools for identifying structural variants and visualize the differences (using R or Python) and (2) Identify the tool specific parameters that increases the specificity and sensitivity of each tool in differentiating true variants and artifacts.

Continue reading

The IGM has performed diagnostic whole exome or whole genome sequencing on more than 5000 CUIMC patients with presentations including undiagnosed diseases of childhood, chronic kidney disease, fetal anomalies and neurological diseases (with a focus on epilepsy) among many others. These patients have been analyzed with a standardized diagnostic pipeline to identify single genotypes that are responsible for disease. Diagnostic genotypes are those that are considered to be likely contributing to the patient’s presentation through study team consensus (a multidisciplinary team that includes population geneticists, molecular geneticists, clinicians, genetic counselors, bioinformaticians and analysts). The student will be expected to 1) build an easy-to-use interface and secure database for variant records and phenotype information, and 2) implement a web interface to facilitate variant submission to ClinVar which is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY