Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.
One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information.
Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance.
Pruning processes can be divided into two types (pre- and post-pruning).
Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion.
Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ce cours est divisé en deux partie. La première partie présente le langage Python et les différences notables entre Python et C++ (utilisé dans le cours précédent ICC). La seconde partie est une intro
Machine learning is a sub-field of Artificial Intelligence that allows computers to learn from data, identify patterns and make predictions. As a fundamental building block of the Computational Thinki
Introduction aux techniques de l'Intelligence Artificielle, complémentée par des exercices de programmation qui montrent les algorithmes et des exemples de leur application à des problèmes pratiques.
In the context of Combinatorial game theory, which typically studies sequential games with perfect information, a game tree is a graph representing all possible game states within such a game. Such games include well-known ones such as chess, checkers, Go, and tic-tac-toe. This can be used to measure the complexity of a game, as it represents all the possible ways a game can pan out. Due to the large game trees of complex games such as chess, algorithms that are designed to play this class of games will use partial game trees, which makes computation feasible on modern computers.
To overcome the problem of outlier data in the regression analysis for numerical-based damage spectra, the C4.5 decision tree learning algorithm is used to predict damage in reinforced concrete buildings in future earthquake scenarios. Reinforced concrete ...
Suppose we have randomized decision trees for an outer function f and an inner function g. The natural approach for obtaining a randomized decision tree for the composed function (f∘ gⁿ)(x¹,…,xⁿ) = f(g(x¹),…,g(xⁿ)) involves amplifying the success probabili ...
Schloss Dagstuhl - Leibniz-Zentrum für Informatik2020
,
In this paper a 2-phase decision tree algorithm is developed to qualitatively predict damage in RC buildings based on earthquake characteristics and structural properties. To this end, the structural properties considered are the natural period of the fund ...