Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This paper presents a novel hybrid framework for generating and updating a synthetic population. We call it hybrid because it combines model-based and data-driven approaches. Existing generators produce a snapshot of synthetic data that becomes outdated over time, requiring complete regeneration using the newest datasets for updates. By leveraging regularly collected data, we propose a method that provides up-to-date synthetic populations at any given moment without using complete re-generation. Our approach generates a baseline synthetic population once, using the Markov Chain Monte Carlo simulation, and projects it over time. In scenarios where disaggregated real data are unavailable, we project the synthetic sample by simulating life-changing events. When new disaggregated real data become available, we calibrate the projected sample using resampling to account for data collection biases and projection errors. We implement and test our approach on 2010, 2015, and 2021 Swiss mobility and transport micro-census data. To generate the baseline sample we use data from 2010 and project it to 2021. We compare the projections of our hybrid approach to existing methods, namely dynamic projection and resampling. The results demonstrate that the synthetic sample generated by the hybrid approach improves the fit to the real data compared to the dynamic projection, and improves heterogeneity compared to the resampling.
Christophe Marcel Georges Galland, Valeria Vento, Sachin Suresh Verlekar, Philippe Andreas Rölli
Andreas Mortensen, David Hernandez Escobar, Léa Deillon, Alejandra Inés Slagter, Eva Luisa Vogt, Jonathan Aristya Setyadji