In this project, we will study historical player transfer data from European professional football (n > 1,000,000 transfers). To supplement this analysis, we will also exploit data on player and team performance, team ownership and management, team finances, and player agents.
Many scholars and policymakers view establishing functioning data markets as essential for the digital economy to bring prosperity and stability to society at large. A key challenge is to determine the value of an individual’s specific data. Is one buyer’s data more valuable than another’s for an e-commerce platform? How much should each be paid?
This research project aims at exploring and developing methods to improve and diversify the visualizations of the interactions between the Internet and the topographical and geopolitical space (i.e. space and the political actors that rule over it) through the case study of a region of interest (could be virtually any region that would be of interest for the student). The main intend of the project is to produce a set of maps and visualizations (including infographics where relevant), as comprehensive and diverse as possible, combining Internet mapping with the geographical and geopolitical context of that region. We will build on top of existing techniques for visualizations of the Internet and discuss potential capacities to further model the Internet.
This study is the first step in exploring an emerging and previously understudied data stream - verbal communication between healthcare providers and patients. In partnership between Columbia Engineering, School of Nursing, Amazon, and the largest home healthcare agency in the US, the study will investigate how to use audio-recorded routine communications between patients and nurses to help identify patients at risk of hospitalization or emergency department visits. The study will combine speech recognition, machine learning and natural language processing to achieve its goals.
All volcanoes on earth are driven by the degassing of volatile elements, mostly H2O and CO2 from their host magma. To model the degassing process, one needs to know the solubility laws of these volatile. To that end, petrologists have been performing high-pressure high-temperature experiments for sixty years to determine how much water and CO2 dissolves in magma as a function of Pressure, Temperature, Melt composition (12 oxides) and oxidation state. To model how these fifteen parameters affect solubility laws petrologist have then relied on empirical interpolation between experimental data points and some extrapolations using classical thermodynamic theory to infer the expected behavior beyond experimental calibration.
Call for Faculty Participation. Fall 2021.
The Data Science Institute is calling for faculty submissions of research projects for the pring and/or summer 2021 sessions of the Data Science Institute (DSI) Scholars Program. The goal of the DSI Scholars Program is to engage undergraduate and master students to work with Columbia faculty, through the creation of data science research internships. Last year, we worked with 38 projects and received more than 400 unique applications from Columbia Students. The program’s unique enrichment activities foster a learning and collaborative community in data science at Columbia. Apply here.
Genome wide CRISPR lethality screens show broad variability in cellular fitness phenotypes across cancer. We postulate that genes with overlapping functions should deliver similar responses enabling functional annotation of uncharacterized genes. Here we will build a network connecting genes based on the similarity of their knockout phenotypes, benchmark this network using protein interaction databases and functional transcriptomics, and leverage network analyses to identify mutational and transcriptional modulators of functional complexes.