Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In this paper we present an empirical study of a workload gathered by crawling the eDonkey network - a dominant peer-to-peer file sharing system - for over 50 days. We first confirm the presence of some known features, in particular the prevalence of free-riding and the Zipflike distribution of file popularity. We also analyze the evolution of document popularity. We then provide an in-depth analysis of several clustering properties of such workloads. We measure the geographical clustering of peers offering a given file. We find that most files are offered mostly by peers of a single country, although popular files don’t have such a clear home country. We then analyze the overlap between contents offered by different peers. We find that peer contents are highly clustered according to several metrics of interest. We propose to leverage this property by allowing peers to search for content without server support, by querying suitably identified semantic neighbours. We find via trace-driven simulations that this approach is generally effective, and is even more effective for rare files. If we further allow peers to query both their semantic neighbours, and in turn their neighbours’ neighbours, we attain hit rates as high as over 55% for neighbour lists of size 20.
Claudia Rebeca Binder Signer, Ralph Hansmann
Pascal Frossard, Francesca De Simone, Laura Toni
Mathias Jacques Jean-Marc Humbert, Kévin Clément Huguenin, Igor Bilogrevic, Mauro Cherubini, Bertil Chapuis, Alexandre Meylan