Model selectionModel selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of learning, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.
Generative modelIn statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following : A generative model is a statistical model of the joint probability distribution on given observable variable X and target variable Y; A discriminative model is a model of the conditional probability of the target Y, given an observation x; and Classifiers computed without using a probability model are also referred to loosely as "discriminative".
Data wranglingData wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.
Training, validation, and test data setsIn machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.
Data scienceData science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data. Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.
SpeciesIn biology, a species (: species) is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. Other ways of defining species include their karyotype, DNA sequence, morphology, behaviour, or ecological niche. In addition, paleontologists use the concept of the chronospecies since fossil reproduction cannot be examined.
Invasive speciesAn invasive or alien species is an introduced species to an environment that becomes overpopulated and harms its new environment. Invasive species adversely affect habitats and bioregions, causing ecological, environmental, and/or economic damage. The term can also be used for native species that become harmful to their native environment after human alterations to its food web - for example, the purple sea urchin (Strongylocentrotus purpuratus) which has decimated kelp forests along the northern California coast due to overharvesting of its natural predator, the California sea otter (Enhydra lutris).
Species distributionSpecies distribution, or species dispersion, is the manner in which a biological taxon is spatially arranged. The geographic limits of a particular taxon's distribution is its range, often represented as shaded areas on a map. Patterns of distribution change depending on the scale at which they are viewed, from the arrangement of individuals within a small family unit, to patterns within a population, or the distribution of the entire species as a whole (range).
Poisson regressionIn statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.
On the Origin of SpeciesOn the Origin of Species (or, more completely, On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life) is a work of scientific literature by Charles Darwin that is considered to be the foundation of evolutionary biology; it was published on 24 November 1859. Darwin's book introduced the scientific theory that populations evolve over the course of generations through a process of natural selection.