Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Inference from data is of key importance in many applications of informatics. The current trend in performing such a task of inference from data is to utilise machine learning algorithms. Moreover, in many applications that it is either required or is preferable to infer from the data in a distributed manner. Many practical difficulties arise from the fact that in many distributed applications we avert from transferring data or parts of it due to costs, privacy and computation considerations. Admittedly, it would be advantageous if the final knowledge, attained through distributed data inference, is common to every participating computing node. The key in achieving the aforementioned task is the distributed average consensus algorithm or simply the consensus algorithm herein. The latter has been used in many applications. Initially the main purpose has been for the estimation of the expectation of scalar valued data distributed over a network of machines without a central node. Notably, the algorithm allows the final outcome to be the same for every participating node. Utilising the consensus algorithm as the centre piece makes the task of distributed data inference feasible. However, there are many difficulties that hinder its direct applicability. Thus, we concentrate on the consensus algorithm with the purpose of addressing these difficulties. There are two main concerns. First, the consensus algorithm has asymptotic convergence. Thus, we may only achieve maximum accuracy if the algorithm is left to run for a large number of iterations. Second, the accuracy attained at any iteration during the consensus algorithm is correlated with the standard deviation of the initial value distribution. The consensus algorithm is inherently imprecise at finite time and this hardens the learning process. We solve this problem by introducing the definitive consensus algorithm. This algorithm attains maximum precision in a finite number of iterations, namely in a number of iterations equal to the diameter of the graph in a distributed and decentralised manner. Additionally, we introduce the nonlinear consensus algorithm and the adaptive consensus algorithm. These are modifications of the original consensus algorithm that allow improved precision with fewer iterations in cases of unknown, partially known and stochastically time-varying network topologies. The definitive consensus algorithm can be incorporated in a distributed data inference framework. We approach the problem of data inference from the perspective of machine learning. Specifically, we tailor this distributed inference framework for machine learning on a communication network with data partitioned on the participating computing nodes. Particularly, the distributed data inference framework is detailed and applied to the case of a multilayer feed forward neural network with error back-propagation. A substantial examination of its performance and its comparison with the non-distributed case, is provided. Theoretical foundation for the definitive consensus algorithm is provided. Moreover, its superior performance is validated by numerical experiments. A brief theoretical examination of the nonlinear and the adaptive consensus algorithms is performed to justify their improved performance with respect to the original consensus algorithm. Moreover, extensive numerical simulations are given to compare the nonlinear and the adaptive algorithm with the original consensus algorithm. The most important contributions of this research are principally the definitive consensus algorithm and the distributed data inference framework. Their combination yields a decentralised distributed process over a communication network capable for inference in agreement over the entire network.
Florent Gérard Krzakala, Julien Marcel Daniel Emmanuel Launay