Training, validation, and test data setsIn machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.
RainRain is water droplets that have condensed from atmospheric water vapor and then fall under gravity. Rain is a major component of the water cycle and is responsible for depositing most of the fresh water on the Earth. It provides water for hydroelectric power plants, crop irrigation, and suitable conditions for many types of ecosystems. The major cause of rain production is moisture moving along three-dimensional zones of temperature and moisture contrasts known as weather fronts.
Rain gaugeA rain gauge (also known as udometer, pluviometer, pluviameter, ombrometer, and hyetometer) is an instrument used by meteorologists and hydrologists to gather and measure the amount of liquid precipitation over a predefined area, over a period of time. It is used to determine the depth of precipitation (usually in mm) that occurs over a unit area and measure rainfall amount. The first known rainfall records were kept by the Ancient Greeks, at around 500 BCE.
EstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the sample mean is a commonly used estimator of the population mean. There are point and interval estimators. The point estimators yield single-valued results. This is in contrast to an interval estimator, where the result would be a range of plausible values.
Statistical model validationIn statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data.
Degrees of freedom (statistics)In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself.
Ordinary least squaresIn statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable.
Boundary problem (spatial analysis)A boundary problem in analysis is a phenomenon in which geographical patterns are differentiated by the shape and arrangement of boundaries that are drawn for administrative or measurement purposes. The boundary problem occurs because of the loss of neighbors in analyses that depend on the values of the neighbors. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data.
Regression validationIn statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.
Edge-preserving smoothingEdge-preserving smoothing or edge-preserving filtering is an technique that smooths away noise or textures while retaining sharp edges. Examples are the median, bilateral, guided, anisotropic diffusion, and Kuwahara filters. In many applications, e.g., medical or satellite imaging, the edges are key features and thus must be preserved sharp and undistorted in smoothing/denoising. Edge-preserving filters are designed to automatically limit the smoothing at “edges” in images measured, e.g., by high gradient magnitudes.