Data For Good: Operationalize Project Cash Flow Model

September 8, 2020 in Open Projects Fall 2020, Data For Good

In collaboration with DDC, Microsoft AI team has developed a predictive machine learning model that forecasts monthly distribution of cash flow for DDC’s active projects. DDC intends to operationalize this model and possibly integrate into our dashboards. Assistance is needed of a data scientist to collaborate with DDC in operationalizing the model whereby DDC can prepare the visuals and data scientist can assist with operationalizing the machine learning components.

Data For Good: The Cost of Human Rights Violations

September 8, 2020 in Open Projects Fall 2020, Data For Good

Rights CoLab is working with the Sustainability Accounting Standards Board (SASB) to develop and define a strengthened set of disclosure standards that investors can use to persuade companies to improve labor rights for both direct employees and workers in their supply chains. The project has two components: a data science project and an Independent Advisory Group. Our coalition of labor experts, data scientists, and SASB partners is focused on improving social disclosure standards that drive real gains in human rights.

Estimating Social Influence with Probabilistic Machine Learning

September 8, 2020 in Closed Projects Fall 2020

We are developing machine learning (ML) methods to understand how people influence each others’ behavior in social networks. For example, on Twitter, do users influence the content shared or posted by their followers? Methods that can identify such patterns of influence will play a role in studying, e.g., the spread of misinformation on social media sites.

Using machine learning to understand tropical cyclone genesis pathways

September 8, 2020 in Open Projects Fall 2020

Until today there is no comprehensive theory for formation of tropical cyclones (hurricanes, typhoons). Therefore, it is common to use statistical methods to derive empirical indices as proxies for the probability for genesis. There are also different types of genesis pathways that have been explored in ad-hoc manner. I would like to explore the possibility of using machine learning to explore tropical cyclone genesis, in particular the different pathways in a more comprehensive manner.

Augmented Supervised Topic Models

May 18, 2020 in Project Summer 2020-2

In this project we’ll be expanding on the existing family of supervised topic models. These models extend LDA to document collections where, for each document, we observe additional labels or values of interest. More specifically, one of the goals of this project is to use additional document level data, such as author information, to develop better exploratory data tools.

Characterizing network behavior in phishing emails

May 18, 2020 in Project Summer 2020-2

Targeted phishing is one of the most common and damaging cybersecurity attacks, incurring tens of billions of dollars in losses a year. In order to increase the success of the phishing emails, attackers often craft emails that impersonate real people or legitimate online services, and send them from networks and hosting sites that have a high reputation. This leads major email security services, including Outlook and Gmail, to often misclassify these emails as legitimate.

Data For Good Project: The Cost of Human Rights Violations

May 18, 2020 in Project Summer 2020-2

Under United States securities laws corporations must disclose material risks to their operations. Human rights issues, especially in authoritarian countries, rarely show up in the information that data providers offer to investors, in part due to the risks to those subject to these abuses. The result is a dearth of data on human rights materiality and the tendency of investors to overlook human rights risks of the companies that they finance.

Data For Good: Operationalize Project Cash Flow Model

Data For Good: The Cost of Human Rights Violations

Estimating Social Influence with Probabilistic Machine Learning

Using machine learning to understand tropical cyclone genesis pathways

Augmented Supervised Topic Models

Characterizing network behavior in phishing emails

Data For Good Project: The Cost of Human Rights Violations

Columbia Data Science Institute (DSI) Scholars Program