A Recipe For Creative Recipes

January 4, 2021 in Open Spring 2021, Open Summer 2021

This project has a two-fold aim. First, we seek to determine what makes an idea seem novel versus ordinary and if there is an ideal mix of the two. Second, building on these findings, we build a generative model that suggests tweaks to an idea that enhance its perceived creativity and appeal. We will pursue these two aims using 69K recipes and reviews from allrecipes.com. We will use NLP approach to extract important features from the recipe such as ingredients, preparation instruction and review content.

COVID-19 Evidence Extraction and Computing

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

With the explosive growth of medical literature, making sense of medical evidence is harder than ever. The free text form also makes it difficult to perform evidence retrieval of appraisal. There is a great need for tools and methods that can structure and reason over medical evidence. The goal of this project is to develop computational and symbolic methods to extract evidence from PubMed abstracts, integrate it with evidence derived from real world clinical data (or practice-based evidence), and perform automated knowledge discovery and evidence reasoning. We also hope this research can support evidence-based medicine during the COVID-19 pandemic and provide opportunities for students to hone his/her skills on natural language processing, data mining, deep learning, and semantic knowledge engineering. We have solid preliminary results for the students to build upon. An open-source PICO parser that extracts Population, Intervention, Comparison and Outcome information from PubMed abstracts has been developed and published. Current COVID-19 literature has been downloaded from PubMed and been pre-processed. Preliminary analyses are under way to investigate the patterns in the study populations in COVID-19 clinical studies. Our next steps include but are not limited to evidence summarization at the study level and evidence reasoning at the problem/topic level.

Mitigating Gender Bias in Sentence-level Natural Language Processing Models

January 4, 2021 in Open Spring 2021, Open Summer 2021

We will further develop a large scale dataset that evaluates gender biases in sentence-level NLP systems. We will then develop training techniques to encourage models to overcome and mitigate gender-based biases.

Natural Language Processing within the CONCERN Project

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

The CONCERN project aims to develop models and tools to quantify clinician concern about patient deterioration in the inpatient setting that can be used in early warning scores. We have discovered and validated several measurable ways within the Electronic Health Record (EHR) to measure clinician concern and have demonstrated that our approach identified patients at risk of deterioration earlier than other methods, which focus only on physiological data. One of our approaches is leveraging documentation of certain concepts within narrative text in nursing notes that are consistent with concern about a patient. However, this narrative free text is not easily accessible - it is often mixed together with structured or templated text and varies over note types. The steps to be performed are

Using Data Science to Improve Telephone Triage of Ophthalmology Patients

January 4, 2021 in Open Spring 2021

Health care professionals cannot examine every person calling the office with a question nor can they return every call. Therefore, medical offices seeking to improve the speed and efficiency of evaluating and triaging patients must utilize telephone personnel who are often non-clinical staff. These telephone triage personnel may be limited in their knowledge and ability to obtain the necessary details of the patient’s medical symptoms and direct medical care accordingly. Their role is not to make diagnoses by phone, but rather to collect sufficient data related to the patient’s complaints and assign them appropriately in order to get the patient to the right level of care with the right provider in the right place at the right time.

Using Social Media for Tobacco Regulatory Intelligence

January 4, 2021 in Open Spring 2021, Open Summer 2021, Open Flexible Timeline 2021

Using NLP techniques to discover 1) new e-cig products online and 2) proposed health claims that users advocate for.

Augmented Supervised Topic Models

May 18, 2020 in Project Summer 2020-2

In this project we’ll be expanding on the existing family of supervised topic models. These models extend LDA to document collections where, for each document, we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as author information, to develop better exploratory data tools.

A Recipe For Creative Recipes

COVID-19 Evidence Extraction and Computing

Mitigating Gender Bias in Sentence-level Natural Language Processing Models

Natural Language Processing within the CONCERN Project

Using Data Science to Improve Telephone Triage of Ophthalmology Patients

Using Social Media for Tobacco Regulatory Intelligence

Augmented Supervised Topic Models

Columbia Data Science Institute (DSI) Scholars Program