Convex polytopeA convex polytope is a special case of a polytope, having the additional property that it is also a convex set contained in the -dimensional Euclidean space . Most texts use the term "polytope" for a bounded convex polytope, and the word "polyhedron" for the more general, possibly unbounded object. Others (including this article) allow polytopes to be unbounded. The terms "bounded/unbounded convex polytope" will be used below whenever the boundedness is critical to the discussed issue.
Markov decision processIn mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes.
SystemA system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and is expressed in its functioning. Systems are the subjects of study of systems theory and other systems sciences. Systems have several common properties and characteristics, including structure, function(s), behavior and interconnectivity.
BIBO stabilityIn signal processing, specifically control theory, bounded-input, bounded-output (BIBO) stability is a form of stability for signals and systems that take inputs. If a system is BIBO stable, then the output will be bounded for every input to the system that is bounded. A signal is bounded if there is a finite value such that the signal magnitude never exceeds , that is For discrete-time signals: For continuous-time signals: For a continuous time linear time-invariant (LTI) system, the condition for BIBO stability is that the impulse response, , be absolutely integrable, i.
Squared deviations from the meanSquared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of variance is either the expected value of the SDM (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for analysis of variance involve the partitioning of a sum of SDM. An understanding of the computations involved is greatly enhanced by a study of the statistical value where is the expected value operator.
Digital controlDigital control is a branch of control theory that uses digital computers to act as system controllers. Depending on the requirements, a digital control system can take the form of a microcontroller to an ASIC to a standard desktop computer. Since a digital computer is a discrete system, the Laplace transform is replaced with the Z-transform. Since a digital computer has finite precision (See quantization), extra care is needed to ensure the error in coefficients, analog-to-digital conversion, digital-to-analog conversion, etc.
Pointwise convergenceIn mathematics, pointwise convergence is one of various senses in which a sequence of functions can converge to a particular function. It is weaker than uniform convergence, to which it is often compared. Suppose that is a set and is a topological space, such as the real or complex numbers or a metric space, for example. A net or sequence of functions all having the same domain and codomain is said to converge pointwise to a given function often written as if (and only if) The function is said to be the pointwise limit function of the Sometimes, authors use the term bounded pointwise convergence when there is a constant such that .
Trace (linear algebra)In linear algebra, the trace of a square matrix A, denoted tr(A), is defined to be the sum of elements on the main diagonal (from the upper left to the lower right) of A. The trace is only defined for a square matrix (n × n). It can be proven that the trace of a matrix is the sum of its (complex) eigenvalues (counted with multiplicities). It can also be proven that tr(AB) = tr(BA) for any two matrices A and B. This implies that similar matrices have the same trace.
Reinforcement learningReinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
Outer productIn linear algebra, the outer product of two coordinate vectors is the matrix whose entries are all products of an element in the first vector with an element in the second vector. If the two coordinate vectors have dimensions n and m, then their outer product is an n × m matrix. More generally, given two tensors (multidimensional arrays of numbers), their outer product is a tensor. The outer product of tensors is also referred to as their tensor product, and can be used to define the tensor algebra.