Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Peer-to-peer (P2P) architectures have emerged as a popular paradigm to support the dynamic and scalable nature of distributed systems. This is particularly relevant today, given the tremendous increase in the intensity of information exchanged over the Internet. A P2P system is typically composed of participants that are willing to contribute resources, such as memory or bandwidth, in the execution of a collaborative task providing a benefit to all participants. File sharing is probably the most widely used collaborative task, where each participant wants to receive an individual copy of some file. Users collaborate by sending fragments of the file they have already downloaded to other participants. Sharing files containing multimedia content, files that typically reach the hundreds of megabytes to gigabytes, introduces a number of challenges. Given typical bandwidths of participants of hundreds of kilobits per second to a couple of megabits per second, it is unacceptable to wait until completion of the download before actually being able to use the file as the download represents a non negligible time. From the point of view of the participant, getting the (entire) file as fast as possible is typically not good enough. As one example, Video on Demand (VoD) is a scenario where a participant would like to start previewing the multimedia content (the stream), offered by a source, even though only a fraction of it has been received, and then continue the viewing while the rest of the content is being received. Following the same line of reasoning, new applications have emerged that rely on live streaming: the source does not own a file that it wants to share with others, but shares content as soon as it is produced. In other words, the content to distribute is live, not pre-recorded and stored. Typical examples include the broadcasting of live sports events, conferences or interviews. The gossip paradigm is a type of data dissemination that relies on random communication between participants in a P2P system, sharing similarities with the epidemic dissemination of diseases. An epidemic starts to spread when the source randomly chooses a set of communication partners, of size fanout, and infects them, i.e., it shares a rumor with them. This set of participants, in turn, randomly picks fanout communication partners each and infects them, i.e., share with them the same rumor. This paradigm has many advantages including fast propagation of rumors, a probabilistic guarantee that each rumor reaches all participants, high resilience to churn (i.e., participants that join and leave) and high scalability. Gossip therefore constitutes a candidate of choice for live streaming in large-scale systems. These advantages, however, come at a price. While disseminating data, gossip creates many duplicates of the same rumor and participants usually receive multiple copies of the same rumor. While this is obviously a feature when it comes to guaranteeing good dissemination of the rumor when churn is high, it is a clear disadvantage when spreading large amounts of multimedia data (i.e., ordered and time-critical) to participants with limited resources, namely upload bandwidth in the case of high-bandwidth content dissemination. This thesis therefore investigates if and how the gossip paradigm can be used as a highly effcient communication system for live streaming under the following specific scenarios: (i) where participants can only contribute limited resources, (ii) when these limited resources are heterogeneously distributed among nodes, and (iii) where only a fraction of participants are contributing their fair share of work while others are freeriding. To meet these challenges, this thesis proposes (i) gossip++: a gossip-based protocol especially tailored for live streaming that separates the dissemination of metadata, i.e., the location of the data, and the dissemination of the data itself. By first spreading the location of the content to interested participants, the protocol avoids wasted bandwidth in sending and receiving duplicates of the payload, (ii) HEAP: a fanout adaptation mechanism that enables gossip to adapt participants' contribution with respect to their resources while still preserving its reliability, and (iii) LiFT: a protocol to secure high-bandwidth gossip-based dissemination protocols against freeriders.
Verónica del Carmen Estrada Galiñanes
,