On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

Martin Mathias Müller
2021
EPFL thesis

Abstract

The quantification of population-level health behaviors is crucial for guiding public health policy. However, traditional methods for measuring such health behaviors have several short- comings. In recent years social media data has been successfully used to measure health behaviors and may be used as a low-cost and real-time addition to traditional data sources. Methods from the field of natural language processing are increasingly used to automatically process, filter and categorize the rapidly growing amount of publicly available social media data. However, a number of methodological challenges limit the rate at which we can generate insight from such data. In this work I will argue that long-term investment into digital infrastructure and open source tooling is required in order to overcome these challenges. In chapter 2 we introduce the Crowd- breaks platform which is the basis of this thesis. Crowdbreaks is an open source framework for real-time data collection, continuous crowdsourced annotation, and continuous re-training of machine learning classifiers. In contrast to traditional research workflows, projects on Crowdbreaks run over an extended period of time, allowing for the observation of health trends over multiple years while keeping algorithms up-to-date. In chapter 3 we quantify the occurrence of concept drift in vaccine-related Twitter data, which further validates the need for the Crowdbreaks platform. In chapter 4 we use the Crowdbreaks platform to trace sentiment towards the novel gene-editing technology CRISPR/Cas9 back to its first application in 2013 and investigate how public opinion may have been affected in context of recent scandals sur- rounding the technology. In chapter 5 we turn our attention to the COVID-19 pandemic and analyze who was speaking and who was heard in the early months of the pandemic. Chapter 6 builds on this work and explores the dynamics of Twitter communities during the COVID-19 pandemic. Lastly, in chapter 7 we introduce COVID-Twitter-BERT, a domain-specific language model which has been used in various downstream natural language processing applications on COVID-19-related Twitter data.

Official source

https://infoscience.epfl.ch/record/283397?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

Graph Chatbot

Chat with Graph Search

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland

What is NExT? A new conceptual model for comfort, satisfaction, health, and well-being in buildings

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland

What is NExT? A new conceptual model for comfort, satisfaction, health, and well-being in buildings