Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Spatial count data models are used to explain and predict the frequency of phenomena such as traffic accidents in geographically distinct entities such as census tracts or road segments. These models are typically estimated using Bayesian Markov chain Monte Carlo (MCMC) simulation methods, which, however, are computationally expensive and do not scale well to large datasets. Variational Bayes (VB), a method from machine learning, addresses the shortcomings of MCMC by casting Bayesian estimation as an optimisation problem instead of a simulation problem. Considering all these advantages of VB, a VB method is derived for posterior inference in negative binomial models with unobserved parameter heterogeneity and spatial dependence. Polya-Gamma augmentation is used to deal with the non-conjugacy of the negative binomial likelihood and an integrated non-factorised specification of the variational distribution is adopted to capture posterior dependencies. The benefits of the proposed approach are demonstrated in a Monte Carlo study and an empirical application on estimating youth pedestrian injury counts in census tracts of New York City. The VB approach is around 45 to 50 times faster than MCMC on a regular eight-core processor in a simulation and an empirical study, while offering similar estimation and predictive accuracy. Conditional on the availability of computational resources, the embarrassingly parallel architecture of the proposed VB method can be exploited to further accelerate its estimation by up to 20 times. (C) 2020 Elsevier B.V. All rights reserved.
Michel Bierlaire, Prateek Bansal
Michel Bierlaire, Thomas Gasos, Prateek Bansal