Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Sequence data are increasingly shared to enable mining applications, in various domains such as marketing, telecommunications, and healthcare. This, however, may expose sensitive sequential patterns, which lead to intrusive inferences about individuals or leak confidential information about organizations. This paper presents the first permutation-based approach to prevent this threat. Our approach hides sensitive patterns by replacing them with carefully selected permutations that avoid changes in the set of frequent nonsensitive patterns (side-effects) and in the ordering information of sequences (distortion). By doing so, it retains data utility in sequence mining and tasks based on itemset properties, as permutation preserves the support of items, unlike deletion, which is used in existing works. To realize our approach, we develop an efficient and effective algorithm for generating permutations with minimal side-effects and distortion. This algorithm also avoids implausible symbol orderings that may exist in certain applications. In addition, we propose a method to hide sensitive patterns from a sequence dataset. Extensive experiments verify that our method allows significantly more accurate data analysis than the state-of-the-art approach.
Thanh Trung Huynh, Quoc Viet Hung Nguyen, Thành Tâm Nguyên, Trung-Dung Hoang
, ,