Training, validation, and test data setsIn machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.
Machine learningMachine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
Anomaly detectionIn data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.
Lifelong learningLifelong learning is the "ongoing, voluntary, and self-motivated" pursuit of knowledge for either personal or professional reasons. It is important for an individual's competitiveness and employability, but also enhances social inclusion, active citizenship, and personal development. In some contexts, the term "lifelong learning" evolved from the term "life-long learners", created by Leslie Watkins and used by Clint Taylor, professor at CSULA and Superintendent for the Temple City Unified School District, in the district's mission statement in 1993, the term recognizes that learning is not confined to childhood or the classroom but takes place throughout life and in a range of situations.
Cross-validation (statistics)Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
Random forestRandom forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.
Document classificationDocument classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science.
Social classA social class or social stratum is a grouping of people into a set of hierarchical social categories, the most common being the upper, middle and lower classes. Membership in a social class can for example be dependent on education, wealth, occupation, income, and belonging to a particular subculture or social network. "Class" is a subject of analysis for sociologists, political scientists, anthropologists and social historians. The term has a wide range of sometimes conflicting meanings, and there is no broad consensus on a definition of "class".
Unsupervised learningUnsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.
Middle classThe middle class refers to a class of people in the middle of a social hierarchy, often defined by occupation, income, education, or social status. The term has historically been associated with modernity, capitalism and political debate. Common definitions for the middle class range from the middle fifth of individuals on a nation's income ladder, to everyone but the poorest and wealthiest 20%. Theories like "Paradox of Interest" use decile groups and wealth distribution data to determine the size and wealth share of the middle class.