**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Empirical distribution function

Summary

In statistics, an empirical distribution function (commonly also called an empirical cumulative distribution function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
Definition
Let (X1, …, Xn) be independent, identically distributed real random variables with the common cumula

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (4)

Related courses (43)

MGT-581: Introduction to econometrics

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

MATH-233: Probability and statistics

The course gives an introduction to probability and statistics for physicists.

MICRO-110: Design of experiments

This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to evaluate validity of experiments, and design, application, and analysis of multifactorial experiments

Related units (2)

Related publications (34)

Loading

Loading

Loading

Related concepts (18)

Kolmogorov–Smirnov test

In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions

Probability distribution

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mat

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Related lectures (97)

The reliability of new overhead electric and telecommunication lines depends principally on the quality of their support structures. These structures are generally made of wood, metal or concrete. The complexity of a natural substance such as wood requires a thorough analysis of the various factors that influence its overall quality. In the case of wood poles, such factors include initial forest growth pattern, the species of wood and its preservative treatment, ageing characteristics, and its various mechanical defects such as knots, cracks etc. The accumulation of knowledge on the effect of the various variables that contribute to the overall quality of a wood support structure permits an optimum use of such a resource. For example, less variability and higher strength of wood support structures permits optimum loading and spacing between structures, thus reducing the number needed in a specific length of an overhead line. If one assumes that in Western Europe 1 wood pole is employed for every 2 inhabitants, and that this proportion increases in less densely populated countries such as the US and Scandinavia, the economics of optimum use of wood as a resource soon become apparent. In less developed countries, the proportions and the economics vary depending on the natural resources such as wood that they employ. The goal of this research is to establish, thanks to non destructive evaluations, a general ageing probabilistic law of the wooden pole based on two distinguished laws: one on the new pole in studying the influence of a grading of the bad elements based on a normal law: "left-truncation of a normal distribution", point 1; and another one based on the in-field wooden pole in exploiting the different parameters such as: the age of the pole, its chemical treatment, its species, its knots etc. in order to define the pole's damage law, point 2. Statistical distribution law of the new wooden pole after grading by non destructive sorting (ultrasounds) of the high mechanical performances supports: This new distribution law is a Gaussian law or evolves to a Log or Weibull's law with 3 parameters according to the inspected species. This grading allows a revalorization of the properties of the new poles and of the design values while guaranteeing an index of reliability required by the design standards, or in improving directly this nominal reliability (economic gain and reliability gain). Statistical distribution law of an aged in-field population (20-50 years old) approached by a bi-modal law which depends on: The distribution law of the new component (see point 1) and its minimal extreme law, which is asymmetrical, for an observation on 50 years. The statistical distribution at the time t of the residual mechanical performances of a group of supports making a local net, evaluated by non destructive methods. The non destructive evaluation is based on the measurements of physical variables (density, biological moisture content) and some descriptive variables from natural origins (diameter, knots, cracks...) and from accidental origins (diameter reduction, lightning cracks...). The statistical distribution at the time t is then obtained on the basis of a model of multivariate non destructive evaluation, generalized to the whole of species and treatments. This model is the other concrete goal to reach in this thesis. As a conclusion, the research demonstrates the influence and the interaction of the new pole grading (distribution at t0) on the modelisation of the distribution at ti (multivariate non destructive model). The data used for the mentioned modelisations come from a significant international database with a large amount of inspected wood poles and with studied cases. This database is the synthesis of about 15 years of research and development leaded by IBOIS-EPFL and its international partners. The probabilistic approaches are then validated by a huge database allowing thus to be directly exploitable. On this basis, all the standards dealing with the new poles and dealing with the controls and maintenances of a wooden pole networks, could be re-examined for a double gain: Concerning the economy: by increasing the capacity of the new poles profiting of an objective quality assurance, and by increasing the life time of the in-field pole, in knowing how to purge only the ones which are under the critical threshold of damage Concerning the reliability: by increasing the reliability of the network from the stage "new pole", by eliminating the weakest components, and by maintaining this reliability during all the life time of the network thanks to a cyclic preventive maintenance (every 5 to 8 years) and the replacement of only the weakened poles.

The estimation of cumulative distributions is classically performed using the empirical distribution function. This estimator has excellent properties but is lacking continuity. Smooth versions of the empirical distribution function have been obtained by kernel methods. We develop a new approach to the estimation of cumulative distributions based on spline functions. More specifically, we apply the smoothing spline minimization criterion known from regression to the empirical distribution function $\edf$. The integrated squared error of the estimated function is shown to be of order $\Op\bigl(n^{-1}\bigr)$ and the supremum of the absolute difference of $\spF$ and $\fuF$ of order $\Op\bigl(n^{-1/4}\bigr)$. The question of the choice of the smoothing parameter is addressed and an approach exploiting the connection with the Anderson--Darling statistic is proposed. The estimation procedure does not force the resulting function to be monotone, but it is shown that the probability for $\spF$ being monotone is tending to one.

2001