Data PreprocessingData preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Analyzing data that has not been carefully screened for such problems can produce misleading results.
CorruptionCorruption is a form of dishonesty or a criminal offense which is undertaken by a person or an organization which is entrusted in a position of authority, in order to acquire illicit benefits or abuse power for one's personal gain. Corruption may involve many activities which include bribery, influence peddling and the embezzlement and it may also involve practices which are legal in many countries. Political corruption occurs when an office-holder or other governmental employee acts with an official capacity for personal gain.
Transformation matrixIn linear algebra, linear transformations can be represented by matrices. If is a linear transformation mapping to and is a column vector with entries, then for some matrix , called the transformation matrix of . Note that has rows and columns, whereas the transformation is from to . There are alternative expressions of transformation matrices involving row vectors that are preferred by some authors. Matrices allow arbitrary linear transformations to be displayed in a consistent format, suitable for computation.
Robust statisticsRobust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution.
Political corruptionPolitical corruption is the use of powers by government officials or their network contacts for illegitimate private gain. Forms of corruption vary, but can include bribery, lobbying, extortion, cronyism, nepotism, parochialism, patronage, influence peddling, graft, and embezzlement. Corruption may facilitate criminal enterprise such as drug trafficking, money laundering, and human trafficking, though it is not restricted to these activities.
Geometric transformationIn mathematics, a geometric transformation is any bijection of a set to itself (or to another such set) with some salient geometrical underpinning. More specifically, it is a function whose domain and range are sets of points — most often both or both — such that the function is bijective so that its inverse exists. The study of geometry may be approached by the study of these transformations. Geometric transformations can be classified by the dimension of their operand sets (thus distinguishing between, say, planar transformations and spatial transformations).
Affine transformationIn Euclidean geometry, an affine transformation or affinity (from the Latin, affinis, "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More generally, an affine transformation is an automorphism of an affine space (Euclidean spaces are specific affine spaces), that is, a function which maps an affine space onto itself while preserving both the dimension of any affine subspaces (meaning that it sends points to points, lines to lines, planes to planes, and so on) and the ratios of the lengths of parallel line segments.
Large language modelA large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training.
Robust regressionIn robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise (i.e. are not robust to assumption violations).
Grey box modelIn mathematics, statistics, and computational modelling, a grey box model combines a partial theoretical structure with data to complete the model. The theoretical structure may vary from information on the smoothness of results, to models that need only parameter values from data or existing literature. Thus, almost all models are grey box models as opposed to black box where no model form is assumed or white box models that are purely theoretical. Some models assume a special form such as a linear regression or neural network.