Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The estimation of cumulative distributions is classically performed using the empirical distribution function. This estimator has excellent properties but is lacking continuity. Smooth versions of the empirical distribution function have been obtained by kernel methods. We develop a new approach to the estimation of cumulative distributions based on spline functions. More specifically, we apply the smoothing spline minimization criterion known from regression to the empirical distribution function . The integrated squared error of the estimated function is shown to be of order and the supremum of the absolute difference of and of order . The question of the choice of the smoothing parameter is addressed and an approach exploiting the connection with the Anderson--Darling statistic is proposed. The estimation procedure does not force the resulting function to be monotone, but it is shown that the probability for being monotone is tending to one.