On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

The quantification of population-level health behaviors is crucial for guiding public health policy. However, traditional methods for measuring such health behaviors have several short- comings. In recent years social media data has been successfully used to measure health behaviors and may be used as a low-cost and real-time addition to traditional data sources. Methods from the field of natural language processing are increasingly used to automatically process, filter and categorize the rapidly growing amount of publicly available social media data. However, a number of methodological challenges limit the rate at which we can generate insight from such data. In this work I will argue that long-term investment into digital infrastructure and open source tooling is required in order to overcome these challenges. In chapter 2 we introduce the Crowd- breaks platform which is the basis of this thesis. Crowdbreaks is an open source framework for real-time data collection, continuous crowdsourced annotation, and continuous re-training of machine learning classifiers. In contrast to traditional research workflows, projects on Crowdbreaks run over an extended period of time, allowing for the observation of health trends over multiple years while keeping algorithms up-to-date. In chapter 3 we quantify the occurrence of concept drift in vaccine-related Twitter data, which further validates the need for the Crowdbreaks platform. In chapter 4 we use the Crowdbreaks platform to trace sentiment towards the novel gene-editing technology CRISPR/Cas9 back to its first application in 2013 and investigate how public opinion may have been affected in context of recent scandals sur- rounding the technology. In chapter 5 we turn our attention to the COVID-19 pandemic and analyze who was speaking and who was heard in the early months of the pandemic. Chapter 6 builds on this work and explores the dynamics of Twitter communities during the COVID-19 pandemic. Lastly, in chapter 7 we introduce COVID-Twitter-BERT, a domain-specific language model which has been used in various downstream natural language processing applications on COVID-19-related Twitter data.

On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology

Graph Chatbot

Chattez avec Graph Search

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland

Measuring and shaping the nutritional environment via food sales logs: case studies of campus-wide food choice and a call to action

Designing self-tracking experiences: A qualitative study of the perceptions of barriers and facilitators to adopting digital health technology for automatic urine analysis at home

The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland

Designing self-tracking experiences: A qualitative study of the perceptions of barriers and facilitators to adopting digital health technology for automatic urine analysis at home

Measuring and shaping the nutritional environment via food sales logs: case studies of campus-wide food choice and a call to action