Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This thesis presents work at the junction of statistics and climate science. We first provide methodology for use by climate scientists when performing fast event attribution using extreme value theory, and then describe two interdisciplinary projects in climate science that involve advanced statistical techniques.The first chapter connects the climate literature on fast extreme event attribution studies with the statistical literature on selection effects. It provides simulations in the univariate and bivariate settings showing that not accounting for the stopping rule can lead to misestimation of return levels, but that bias can be reduced by more appropriate analysis. We discuss the spatial selection bias induced by the systematic study of the location where the extreme event happened, and show that the estimated return period for the "trigger event" based on a dataset that contains this event can be both biased and very uncertain.We illustrate the impact of timing and spatial selection bias on return level estimation with analysis of environmental data inspired by real use cases. The Appendix describes a Python package for likelihood inference that was useful for the simulations and case studies in this chapter. The rest of the thesis describes two applications of machine learning and statistics in climate science. The first topic studied is downscaling of historical wind fields in Switzerland. High-resolution wind maps are essential to climate scientists looking to study past climate events such as wildfires and avalanches. The deep learning model proposed in the second chapter provides realistic-looking high-resolution (1.1km) historical maps of gridded hourly wind fields over Switzerland from ERA5 input on a 25km grid. The downscaled wind fields demonstrate physically plausible orographic effects, such as ridge acceleration and sheltering, which are not resolved in the original ERA5 fields. The prediction of the aggregated wind speed distribution is very good and robust. Regionally averaged image-specific metrics measure generally better for locations over the flatter Swiss Plateau than for Alpine regions. The third chapter proposes a random line process for hail impact modelling. Hail damage is crucial for insurance companies, because big hailstones tend to produce large economic losses. Appropriate modelling and uncertainty quantification for hail impact could also be a good starting point for the study of the sensitivity of our economy to a changing climate. A two-step Bayesian hierarchical framework incorporating the random line process and extreme value theory is built to model the counts and value of hail impacts for individual buildings in the canton of Zürich and fitted using insurance data for buildings. The results are compared to the use of a benchmark deterministic hail impact function. The random line model with extreme marks proves better at capturing hail spatial patterns than the benchmark and allows for localised and extreme damage, which is observed in the insurance data.