Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Artificial immune systems (AIS) use the concepts and algorithms inspired by the theory of how the human immune system works. This document presents the design and initial evaluation of a new artificial immune system for collaborative spam filtering. Collaborative spam filtering allows for the detection of not-previously-seen spam content, by exploiting its bulkiness. Our system uses two novel and possibly advantageous techniques for collaborative spam filtering. The first novelty is local processing of the signatures created from the emails prior to deciding whether and which of the generated signatures will be exchanged with other collaborating antispam systems. This processing exploits both the email-content profiles of the users and implicit or explicit feedback from the users, and it uses customized AIS algorithms. The idea is to enable only good quality and effective information to be exchanged among collaborating antispam systems. The second novelty is the representation of the email content, based on a sampling of text strings of a predefined length and at random positions within the emails, and a use of a custom similarity hashing of these strings. Compared to the existing signature generation methods, the proposed sampling and hashing are aimed at achieving a better resistance to spam obfuscation (especially text additions) - which means better detection of spam, and a better precision in learning spam patterns and distinguishing them well from normal text - which means lowering the false detection of good emails. Initial evaluation of the system shows that it achieves promising detection results under modest collaboration, and that it is rather resistant under the tested obfuscation. In order to confirm our understanding of why the system performed well under this initial evaluation, an additional factorial analysis should be done. Also, evaluation under more sophisticated spammer models is necessary for a more complete assessment of the system abilities.
Paul Arthur Adrien Pierre Dreyfus