Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Understanding epidemic propagation in large networks is an important but challenging task, especially since we usually lack information, and the information that we have is often counter-intuitive. An illustrative example is the dependence of the final size of the epidemic on the location of the initial infected agents (sources): common sense dictates that the most dangerous location for the sources is the largest city, but the second chapter of the thesis shows that this holds true only if the epidemic is just above the infection threshold.Identifying the initial infected agents can help us better understand the epidemic. The focus of this thesis is on identifying the very first infected agent, also called the source or patient zero. According to the standard assumptions, a few agents reveal their symptom onset time, and then it is our goal to identify the source based on this information, together with full knowledge of the underlying network. Unfortunately, even if we can choose the set of agents that are queried about their symptom onset time, the number of queries required for reliable source identification is too large for practical applications. In this thesis, we carefully assess if this issue can be mitigated by introducing adaptivity to the standard assumptions. Our main goal is to study the reduction in the query complexity if the queries can be chosen adaptively to previous answers, but we also investigate whether adaptively querying the edges can relax the full knowledge assumption on the network.Providing rigorous proofs about source identification with time queries is difficult. A notable exception is when the infection is passed with a known, deterministic delay from each agent to all of its neighbors, in which case the number of required non-adaptive and adaptive queries are equivalent to well-known notions in combinatorics; the metric dimension (MD) and the sequential metric dimension (SMD), respectively. We extend previous results in the field by computing the MD of a large class of random trees, where adaptivity can significantly reduce the query complexity, and the SMD of Erdos-Rényi random networks, where the reduction is found to be small, at most a constant factor. We address the case of non-deterministic diffusion processes for the first time in the mathematical literature: on the path graph, we observe a striking, double logarithmic decrease in adaptive query complexity compared to the non-adaptive case.Our analysis on the robustness of the MD to adding a single edge to specially constructed and d-dimensional grid networks suggests that even small changes in the network could easily derail source identification algorithms. This is concerning since it is difficult to obtain a perfect dataset about the entire contact network in practice. Inspired by recent implementations of contact tracing, we propose new source identification assumptions, where not only the symptom onset times, but also the edges of the network are queried by the algorithm, resulting in less, but potentially higher quality information. We propose two local search algorithms that outperform state of the art identification algorithms tailored to the new assumptions, and we analytically approximate their success probabilities on realistic random graph models. The adaptive assumptions enable us to evaluate our algorithms on a COVID-19 epidemic simulator: the first time that source identification algorithms are tested on such a complex dataset.