From politicians to podcast hosts, online platforms have systematically banned (''deplatformed'') influential users for breaking platform guidelines. Previous inquiries on the effectiveness of this intervention are inconclusive because 1) they consider onl ...
Early work has found that large language models (LLMs) can generate persuasive content. However, evidence on whether they can also personalize arguments to individual attributes remains limited, despite being crucial for assessing misuse. This preregistere ...
Despite the remarkable performance of large language models (LLMs), they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation. An important factor contributing to LLMs' suboptimal performan ...
Wikipedia is a well-known platform for disseminating knowledge, and scientific sources, such as journal articles, play a critical role in supporting its mission. The open access movement aims to make scientific knowledge openly available, and we might intu ...
Effective content moderation in online communities is often a delicate balance between maintaining content quality and fostering user participation. In this paper, we introduce post guidance, a novel approach to community moderation that proactively guides ...
Given the prompt “Rome is in”, can we steer a language model to flip its prediction of an incorrect token “France” to a correct token “Italy” by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a mo ...
Association for Computational Linguistics (ACL)2024
A rapidly growing number of applications rely on a small set of closed-source language models (LMs). This dependency might introduce novel security risks if LMs develop self-recognition capabilities. Inspired by human identity verification methods, we prop ...
Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perfor ...
Association for Computational Linguistics (ACL)2024