Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The estimation of cumulative distributions is classically performed using the empirical distribution function. This estimator has excellent properties but is lacking continuity. Smooth versions of the empirical distribution function have been obtained by kernel methods. We apply the smoothing spline minimization criterion, known from regression, to the empirical distribution function . An approach exploiting the connection with the Anderson--Darling statistic is used for the choice of the smoothing parameter. A small simulation study shows that the new estimator behaves similarly to the kernel distribution function estimator. The application to several datasets assesses the estimator's usefulness in data analysis. Finally, the estimation procedure is applied to the smoothing of the Kaplan--Meier survival function estimator.
Giancarlo Ferrari Trecate, Florian Dörfler, Jean-Sébastien Hubert Brouillon
Victor Panaretos, Laya Ghodrati