In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error. It was discovered around 2018 when researchers were trying to reconcile the bias-variance tradeoff in classical statistics, which states that having too many parameters will yield an extremely large error, with the 2010s empirical observation of machine learning practitioners that the larger models are, the better they work. The scaling behavior of double descent has been found to follow a broken neural scaling law functional form.