Data For Good: African North Americans Database

September 6, 2022 in Open Fall 2022, Open Spring 2023, Data For Good

This project is the first comprehensive examination of African North Americans who crossed one of the U.S.-Canada borders, going either direction, after the Underground Railroad, in the generation alive roughly 1865-1930. It analyzes census and other records to match individuals and families across the decades, despite changes or ambiguities in their names, ages, “color,” birthplace, or other details.

A Recipe For Creative Recipes

January 4, 2021 in Open Spring 2021, Open Summer 2021

This project has a two-fold aim. First, we seek to determine what makes an idea seem novel versus ordinary and if there is an ideal mix of the two. Second, building on these findings, we build a generative model that suggests tweaks to an idea that enhance its perceived creativity and appeal. We will pursue these two aims using 69K recipes and reviews from allrecipes.com. We will use NLP approach to extract important features from the recipe such as ingredients, preparation instruction and review content.

Comparison of four workflows for structural variants identification

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-gene associations, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to identify structural variants (i.e. relatively large deletions and duplications of genomic sequences). These workflows can lead to the identification of different structural variants, raising the risk of missing disease-causing variants when using only one of those methods.

Comparison of three workflows for the identification of genetic variants

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Recent advances in genomic technologies have led to the identification of many novel disease-associated genes, enabling more precise diagnoses. Along with the technologies enabling rapid DNA sequencing, multiple computational approaches have been developed to extract the genetic information from raw data, including The Broad Institute’s GATK, Seven Bridge’s GenomeGraph and Google’s DeepVariant. These workflows can lead to the identification of different genetic variants, raising the risk of missing disease-causing variants when using only one of these methods.

Data For Good: African North Americans Database

January 4, 2021 in Open Spring 2021, Open Summer 2021, Data For Good

This project is the first comprehensive examination of African North Americans who crossed one of the U.S.-Canada borders, going either direction, after the Underground Railroad, in the generation alive roughly 1865-1930. It analyzes census and other records to match individuals and families across the decades, despite changes or ambiguities in their names, ages, “color,” birthplace, or other details.

Identification of genomic and transcriptomic landscape of head and neck cancer

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

A highly collaborative project is available in Dr. Alison Taylor’s and Dr. Fatemeh Momen-Heravi’s lab. This project aims to identify molecular changes such as mutations and RNA signature of head and neck cancer in Black/African American and Hispanic minority populations with the goal of identifying novel therapies for cancer patients and reduce health disparities. The project entails analysis of DNA and RNA sequencing data. Basic coding skills are necessary and the student will be mentored by both principal investigators. The prospective candidate should be motivated, a fast learner, and be able to work in a highly collaborative team environment.

Meta-analysis of single-cell genomic data to define cellular heterogeneity and dynamics in atherosclerotic vasculature

January 4, 2021 in Open Spring 2021, Open Summer 2021

Atherosclerosis, a chronic inflammatory disease of the artery wall, is the underlying cause of human coronary heart diseases. Single-cell genomics have catalyzed the revolution in understanding of cellular heterogeneity and dynamics in atherosclerotic vasculature. The goal of the project is to leverage published and our own single-cell genomic data and perform a meta-analysis. Meta-analysis allows integrated analysis of much larger cell numbers and helps resolve the full spectrum of cellular heterogeneity and dynamics in atherosclerotic vessels and facilitate therapeutic translation. The DSI scholar will: (1) use the latest bioinformatic pipeline to integrate the existing scRNA-seq, CITE-seq, and scATAC-seq datasets; (2) analyze the integrated datasets using R/Bioconductor packages (e.g. Seurat); (3) interpret the data using pathway and network analysis. Some relevant workflows are available through the “Resources” page of our lab website at https://hanruizhang.github.io/zhanglab/.

Data For Good: African North Americans Database

A Recipe For Creative Recipes

Comparison of four workflows for structural variants identification

Comparison of three workflows for the identification of genetic variants

Data For Good: African North Americans Database

Identification of genomic and transcriptomic landscape of head and neck cancer

Meta-analysis of single-cell genomic data to define cellular heterogeneity and dynamics in atherosclerotic vasculature

Columbia Data Science Institute (DSI) Scholars Program