**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Localizing the Source of an Epidemic Using Few Observations

Abstract

Localizing the source of an epidemic is a crucial task in many contexts, including the detection of malicious users in social networks and the identification of patient zeros of disease outbreaks. The difficulty of this task lies in the strict limitations on the data available: In most cases, when an epidemic spreads, only few individuals, who we will call sensors, provide information about their state. Furthermore, as the spread of an epidemic usually depends on a large number of variables, accounting for all the possible spreading patterns that could explain the available data can easily result in prohibitive computational costs. Therefore, in the field of source localization, there are two central research directions: The design of practical and reliable algorithms for localizing the source despite the limited data, and the optimization of data collection, i.e., the identification of the most informative sensors. In this dissertation we contribute to both these directions. We consider network epidemics starting from an unknown source. The only information available is provided by a set of sensor nodes that reveal if and when they become infected. We study how many sensors are needed to guarantee the identification of the source. A set of sensors that guarantees the identification of the source is called a double resolving set (DRS); the minimum size of a DRS is called the double metric dimension (DMD). Computing the DMD is, in general, hard, hence estimating it with bounds is desirable. We focus on G(N,p) random networks for which we derive tight bounds for the DMD. We show that the DMD is a non-monotonic function of the parameter p, hence there are critical parameter ranges in which source localization is particularly difficult. Again building on the relationship between source localization and DRSs, we move to optimizing the choice of a fixed number K of sensors. First, we look at the case of trees where the uniqueness of paths makes the problem simpler. For this case, we design polynomial time algorithms for selecting K sensors that optimize certain metrics of interest. Next, turning to general networks, we show that the optimal sensor set depends on the distribution of the time it takes for an infected node u to infect a non-infected neighbor v, which we call the transmission delay from u to v. We consider both a low- and a high-variance regime for the transmission delays. We design algorithms for sensor placement in both cases, and we show that they yield an improvement of up to 50% over state-of-the-art methods. Finally, we propose a framework for source localization where some sensors (called dynamic sensors) can be added while the epidemic spreads and the localization progresses. We design an algorithm for joint source localization and dynamic sensor placement; This algorithm can handle two regimes: offline localization, where we localize the source after the epidemic spread, and online localization, where we localize the source while the epidemic is ongoing. We conduct an empirical study of offline and online localization and show that, by using dynamic sensors, the number of sensors we need to localize the source is up to 10 times less with respect to a strategy where all sensors are deployed a priori. We also study the resistance of our methods to high-variance transmission delays and show that, even in this setting, using dynamic sensors, the source can be localized with less than 5% of the nodes being sensors.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (36)

Related publications (128)

Data

In common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures.

Network science

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes (or vertices) and the connections between the elements or actors as links (or edges). The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology.

Epidemic

An epidemic (from Greek ἐπί epi "upon or above" and δῆμος demos "people") is the rapid spread of disease to a large number of hosts in a given population within a short period of time. For example, in meningococcal infections, an attack rate in excess of 15 cases per 100,000 people for two consecutive weeks is considered an epidemic. Epidemics of infectious disease are generally caused by several factors including a change in the ecology of the host population (e.g.

Related MOOCs (33)

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Digital Signal Processing II

Adaptive signal processing, A/D and D/A. This module provides the basic
tools for adaptive filtering and a solid mathematical framework for sampling and
quantization

Digital Signal Processing III

Advanced topics: this module covers real-time audio processing (with
examples on a hardware board), image processing and communication system design.

Whether it be for environmental sensing or Internet of Things (IoT) applications, sensor networks are of growing use thanks to their large-scale sensing and distributed data storage abilities. However, when used in hazardous conditions and thus undergoing ...

2023The field of synthetic data is more and more present in our everyday life. The transportation domain is particularly interested in improving the methods for the generation of synthetic data in order to address the privacy and availability issue of real dat ...

2023Catherine Dehollain, Naci Pekçokgüler

Wearable and implantable medical devices are of great importance in diagnosis and treatment as they provide continuous monitoring and data collection. Considering the comfort of the patient and ease-of-operation, these devices require wireless data transmi ...