Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
The extensive and successful use of social media has enhanced and empowered a variety of movements all over the world in a way that is hard to achieve through conventional means. This has led to numerous studies that leverage online social data to describe, analyze, model or gain insights about the online activism, as well as to empower and facilitate information sharing for the activists. Despite the recent success of public movements thanks to the efficiency that online tools brought to the activists, it is still unclear what factors impact success. In addition, the support to empower online activism remains scarce. This thesis investigates the aforementioned issues in respect to, first, analysis and modeling of the online activism in several forms, second, algorithmic approaches to producing unstructured texts for individual users and filtering social media stream. To reflect on these issues, we conduct separate studies that enable us to apply consistent methodologies to profile online campaigns, devise systematically interpretable models for online petitions, we propose and assess efficient template induction tools for email composition, and design and compare an efficient and accurate approach for filtering topical short texts. The main contributions of this thesis are: 1) Gained insights into the online public campaigns. We ran a comparative study opposing over a hundred awareness and mobilization public campaigns on social media regarding online and offline actions that were performed by the activists. To this end, we introduced a generic methodology for categorizing online campaigns based on their goals and user engagement, as well as extracted campaign's actions from their social media traces. We discovered substantial differences between types of the campaigns and their corresponding actions. 2) Scrutinized and quantified the effect of external sources on online reinforced phenomena - over 4,000 e-petitions - and we proposed an accurate and interpretable model that dissects the impact of various confounders on the time evolution of petition's signatures. We showed performance variations of the designed model with various combinations of the external factors that outperform multiple baselines and is interpretable across various petitions. Our findings suggested that effects from social media are prolonged and are stronger for the successful petitions, while the direct promotion is the strongest. 3) Assessed the extent of repetitive content in targeted email messages. We defined a task of template induction over unstructured email corpora and proposed an efficient and accurate algorithm that, first, identifies repetitive and representative phrases that are usually typed by a user, and second, aligns these phrases into a template. While we found over 1% of email users might benefit from the templatization, we also uncovered the potential of saving up to several dozens of words in email writing effort, which in turn is an essential amenity for the activists. 4) Improved document filtering for a particular topic or event by introducing a method that increases the accuracy of filtered short texts samples while preserving efficiency and recall for small training sets. To locate and monitor particular topics or events, this method constructs and applies a filter of automatically generated sets of patterns that represent semantically homogeneous groups of input data.
Daniel Gatica-Perez, Haeeun Kim