On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

Martin Mathias Müller
2021
Thèse EPFL

Résumé

The quantification of population-level health behaviors is crucial for guiding public health policy. However, traditional methods for measuring such health behaviors have several short- comings. In recent years social media data has been successfully used to measure health behaviors and may be used as a low-cost and real-time addition to traditional data sources. Methods from the field of natural language processing are increasingly used to automatically process, filter and categorize the rapidly growing amount of publicly available social media data. However, a number of methodological challenges limit the rate at which we can generate insight from such data. In this work I will argue that long-term investment into digital infrastructure and open source tooling is required in order to overcome these challenges. In chapter 2 we introduce the Crowd- breaks platform which is the basis of this thesis. Crowdbreaks is an open source framework for real-time data collection, continuous crowdsourced annotation, and continuous re-training of machine learning classifiers. In contrast to traditional research workflows, projects on Crowdbreaks run over an extended period of time, allowing for the observation of health trends over multiple years while keeping algorithms up-to-date. In chapter 3 we quantify the occurrence of concept drift in vaccine-related Twitter data, which further validates the need for the Crowdbreaks platform. In chapter 4 we use the Crowdbreaks platform to trace sentiment towards the novel gene-editing technology CRISPR/Cas9 back to its first application in 2013 and investigate how public opinion may have been affected in context of recent scandals sur- rounding the technology. In chapter 5 we turn our attention to the COVID-19 pandemic and analyze who was speaking and who was heard in the early months of the pandemic. Chapter 6 builds on this work and explores the dynamics of Twitter communities during the COVID-19 pandemic. Lastly, in chapter 7 we introduce COVID-Twitter-BERT, a domain-specific language model which has been used in various downstream natural language processing applications on COVID-19-related Twitter data.

Source officielle

https://infoscience.epfl.ch/record/283397?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

Graph Chatbot

Chattez avec Graph Search

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland

What is NExT? A new conceptual model for comfort, satisfaction, health, and well-being in buildings

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

What is NExT? A new conceptual model for comfort, satisfaction, health, and well-being in buildings

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland