Comparison of four workflows for structural variants identification

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-gene associations, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to identify structural variants (i.e. relatively large deletions and duplications of genomic sequences). These workflows can lead to the identification of different structural variants, raising the risk of missing disease-causing variants when using only one of those methods.

Comparison of three workflows for the identification of genetic variants

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-associated genes, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to extract the genetic information from raw data, including The Broad Institute’s GATK, Seven Bridge’s GenomeGraph and Google’s DeepVariant. These workflows can lead to the identification of different genetic variants, raising the risk of missing disease-causing variants when using only one of these methods.

Quantification of expected bi-allelic genetic variants' rate

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Advances in genomic technologies have led to the identification of many novel disease-gene associations, allowing medical diagnoses to be more precise and tailored to an individual. However, the high number of variants present in each individual represents a significant challenge for the implementation of genomic medicine. The goal of this project is to enable the identification of novel genes associated with recessive disorders.

Identification of genetic variation involved in Mendelian diseases

September 8, 2020 in Open Projects Fall 2020

This project will focus on the identification of genetic factors involved in various forms of hereditary diseases, including neurodevelopmental disorders, hearing loss, skeletal disorders and more. Some of these children endure years-long diagnostic odysseys of trial-and-error testing with inconclusive results and misdirected treatments. We are dedicated to track down their molecular causes by integrating various “-omics” technologies, including genomics, transcriptomics and epigenomics.

Decoding the human genome with interpretable deep learning

September 30, 2019 in Project Fall 2019

The function for much of the 3 billion letters in the human genome remain to be understood. Advances in DNA sequencing technology have generated enormous amount of data, yet we don’t have the tool to extract rules of how the genome works. Deep learning holds great potential in decoding the genome, in particular due to the digital nature of DNA sequences and the ability to handle large data sets. However, like many other applications, the interpretability of deep learning models hampers its ability to help understand the genome. We are developing deep learning architectures embedded with the principles of gene regulation and we will be leveraging millions of existing whole genome measurements of gene activity to learn a mechanistic model of gene regulation in human cells.

Comparison of four workflows for structural variants identification

Comparison of three workflows for the identification of genetic variants

Quantification of expected bi-allelic genetic variants' rate

Identification of genetic variation involved in Mendelian diseases

Decoding the human genome with interpretable deep learning

Columbia Data Science Institute (DSI) Scholars Program