This project has a two-fold aim. First, we seek to determine what makes an idea seem novel versus ordinary and if there is an ideal mix of the two. Second, building on these findings, we build a generative model that suggests tweaks to an idea that enhance its perceived creativity and appeal. We will pursue these two aims using 69K recipes and reviews from allrecipes.com. We will use NLP approach to extract important features from the recipe such as ingredients, preparation instruction and review content.

Continue reading

With the explosive growth of medical literature, making sense of medical evidence is harder than ever. The free text form also makes it difficult to perform evidence retrieval of appraisal. There is a great need for tools and methods that can structure and reason over medical evidence. The goal of this project is to develop computational and symbolic methods to extract evidence from PubMed abstracts, integrate it with evidence derived from real world clinical data (or practice-based evidence), and perform automated knowledge discovery and evidence reasoning. We also hope this research can support evidence-based medicine during the COVID-19 pandemic and provide opportunities for students to hone his/her skills on natural language processing, data mining, deep learning, and semantic knowledge engineering. We have solid preliminary results for the students to build upon. An open-source PICO parser that extracts Population, Intervention, Comparison and Outcome information from PubMed abstracts has been developed and published. Current COVID-19 literature has been downloaded from PubMed and been pre-processed. Preliminary analyses are under way to investigate the patterns in the study populations in COVID-19 clinical studies. Our next steps include but are not limited to evidence summarization at the study level and evidence reasoning at the problem/topic level.

Continue reading

The CONCERN project aims to develop models and tools to quantify clinician concern about patient deterioration in the inpatient setting that can be used in early warning scores. We have discovered and validated several measurable ways within the Electronic Health Record (EHR) to measure clinician concern and have demonstrated that our approach identified patients at risk of deterioration earlier than other methods, which focus only on physiological data. One of our approaches is leveraging documentation of certain concepts within narrative text in nursing notes that are consistent with concern about a patient. However, this narrative free text is not easily accessible - it is often mixed together with structured or templated text and varies over note types. The steps to be performed are

Continue reading

Health care professionals cannot examine every person calling the office with a question nor can they return every call. Therefore, medical offices seeking to improve the speed and efficiency of evaluating and triaging patients must utilize telephone personnel who are often non-clinical staff. These telephone triage personnel may be limited in their knowledge and ability to obtain the necessary details of the patient’s medical symptoms and direct medical care accordingly. Their role is not to make diagnoses by phone, but rather to collect sufficient data related to the patient’s complaints and assign them appropriately in order to get the patient to the right level of care with the right provider in the right place at the right time.

Continue reading

In this project we’ll be expanding on the existing family of supervised topic models. These models extend LDA to document collections where, for each document, we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as author information, to develop better exploratory data tools.

Continue reading

Author's picture

Columbia Data Science Institute (DSI) Scholars Program

The DSI Scholars Program is to engage and support undergraduate and master students in participating data science related research with Columbia faculty. The program’s unique enrichment activities will foster a learning and collaborative community in data science at Columbia.

Columbia University DSI

New York, NY