Data-Driven, Personalized Usable Privacy

We live in the "inverse-privacy" world, where service providers derive insights from users' data that the users do not even know about. This has been fueled by the advancements in machine learning technologies, which allowed providers to go beyond the superficial analysis of users' transactions to the deep inspection of users' content. Users themselves have been facing several problems in coping with this widening information discrepancy. Although the interfaces of apps and websites are generally equipped with privacy indicators (e.g., permissions, policies, ...), this has not been enough to create the counter-effect. We particularly identify three of the gaps that hindered the effectiveness and usability of privacy indicators: - Scale Adaptation: The scale at which service providers are collecting data has been growing on multiple fronts. Users, on the other hand, have limited time, effort, and technological resources to cope with this scale. - Risk Communication: Although providers utilize privacy indicators to announce what and (less often) why they need particular pieces of information, they rarely relay what can be potentially inferred from this data. Without this knowledge, users are less equipped to make informed decisions when they sign in to a site or install an application. - Language Complexity: The information practices of service providers are buried in complex, long privacy policies. Generally, users do not have the time and sometimes the skills to decipher such policies, even when they are interested in knowing particular pieces of it. In this thesis, we approach usable privacy from a data perspective. Instead of static privacy interfaces that are obscure, recurring, or unreadable, we develop techniques that bridge the understanding gap between users and service providers. Towards that, we make the following contributions: - Crowdsourced, data-driven privacy decision-making: In an effort to combat the growing scale of data exposure, we consider the context of files uploaded to cloud services. We propose C3P, a framework for automatically assessing the sensitivity of files, thus enabling realtime, fine-grained policy enforcement on top of unstructured data. - Data-driven app privacy indicators: We introduce PrivySeal, which involves a new paradigm of dynamic, personalized app privacy indicators that bridge the risk under- standing gap between users and providers. Through PrivySeal's online platform, we also study the emerging problem of interdependent privacy in the context of cloud apps and provide a usable privacy indicator to mitigate it. - Automated question answering about privacy practices: We introduce PriBot, the first automated question-answering system for privacy policies, which allows users to pose their questions about the privacy practices of any company with their own language. Through a user study, we show its effectiveness at achieving high accuracy and relevance for users, thus narrowing the complexity gap in navigating privacy policies. A core aim of this thesis is paving the road for a future where privacy indicators are not bound by a specific medium or pre-scripted wording. We design and develop techniques that enable privacy to be communicated effectively in an interface that is approachable to the user. For that, we go beyond textual interfaces to enable dynamic, visual, and hands-free privacy interfaces that are fit for the variety of emerging technologies.

Data-Driven, Personalized Usable Privacy

Graph Chatbot

Chattez avec Graph Search

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

The Privacy Power of Correlated Noise in Decentralized Learning

Fast refacing of MR images with a generative neural network lowers re-identification risk and preserves volumetric consistency

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Fast refacing of MR images with a generative neural network lowers re-identification risk and preserves volumetric consistency

The Privacy Power of Correlated Noise in Decentralized Learning