Publication

Towards Novel Evaluation Methods for Social Dialog Systems

Ekaterina Svikhnushina
2023
Thèse EPFL
Résumé

Language has shaped human evolution and led to the desire to endow machines with language abilities. Recent advancements in natural language processing enable us to achieve this breakthrough in human-machine interaction. However, introducing conversational agents with enhanced language skills raises concerns about their emotional and social engagement. To ensure acceptance, control and evaluation mechanisms must be established. Meanwhile, creating meaningful evaluation metrics for social chatbots is challenging due to the new and undefined nature of this field, lacking clear design guidelines. In this thesis, we contribute novel, effective evaluation frameworks for social chatbots developed based on human-centered research principles. The thesis is structured into three parts. The first part introduces two studies that explore users' expectations of conversational chatbots and their connection to present experiences. The initial study employs qualitative semi-structured interviews and quantitative survey analysis to establish a model of essential social qualities expected from chatbots: politeness, entertainment, attentive curiosity, and empathy (PEACE). The second study examines online chatbot reviews and reveals a discrepancy between users' expectations and their current experiences, highlighting the need for chatbots to possess more advanced social capabilities. The second part of the thesis focuses on attentive curiosity, an essential element that has received limited attention in the study of social chatbots. We propose EQT, a taxonomy of tags to differentiate between different functions of empathetic questions in social interactions. Additionally, we develop automatic classifiers for these labels, allowing us to investigate which question-asking strategies are most effective in specific emotional contexts. This analysis sheds light on the suitability of various approaches for fostering engagement and understanding in social conversations. In the third part, we expand upon our earlier findings and create comprehensive evaluation frameworks for social chatbots. First, we introduce iEval, a human evaluation framework specifically designed to capture users' subjective perceptions of their conversational partners during interactive exchanges. Using this framework, we benchmark four state-of-the-art empathetic chatbots and examine discourse factors that account for the differences in their performance levels. Additionally, we showcase how our evaluation framework can be automated by using prompting of the latest large language models. This enables us to approximate live user studies and achieve a very strong correlation with human judgment. The novel findings presented here enhance our understanding of user interaction with conversational technologies. Moreover, the developed evaluation criteria and frameworks provide valuable insights and tools for shaping and informing the design of future social chatbots.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.