Deep reinforcement learningDeep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.
Statistical modelA statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process. When referring specifically to probabilities, the corresponding term is probabilistic model. A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables.
Stochastic gradient descentStochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).
Polynomial hierarchyIn computational complexity theory, the polynomial hierarchy (sometimes called the polynomial-time hierarchy) is a hierarchy of complexity classes that generalize the classes NP and co-NP. Each class in the hierarchy is contained within PSPACE. The hierarchy can be defined using oracle machines or alternating Turing machines. It is a resource-bounded counterpart to the arithmetical hierarchy and analytical hierarchy from mathematical logic. The union of the classes in the hierarchy is denoted PH.
Hermite polynomialsIn mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence. The polynomials arise in: signal processing as Hermitian wavelets for wavelet transform analysis probability, such as the Edgeworth series, as well as in connection with Brownian motion; combinatorics, as an example of an Appell sequence, obeying the umbral calculus; numerical analysis as Gaussian quadrature; physics, where they give rise to the eigenstates of the quantum harmonic oscillator; and they also occur in some cases of the heat equation (when the term is present); systems theory in connection with nonlinear operations on Gaussian noise.
Romanovski polynomialsIn mathematics, the Romanovski polynomials are one of three finite subsets of real orthogonal polynomials discovered by Vsevolod Romanovsky (Romanovski in French transcription) within the context of probability distribution functions in statistics. They form an orthogonal subset of a more general family of little-known Routh polynomials introduced by Edward John Routh in 1884. The term Romanovski polynomials was put forward by Raposo, with reference to the so-called 'pseudo-Jacobi polynomials in Lesky's classification scheme.
Joint probability distributionGiven two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables. It also encodes the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).
Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Large language modelA large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training.
Generalized functionIn mathematics, generalized functions are objects extending the notion of functions. There is more than one recognized theory, for example the theory of distributions. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing discrete physical phenomena such as point charges. They are applied extensively, especially in physics and engineering. A common feature of some of the approaches is that they build on operator aspects of everyday, numerical functions.