Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Risk aversionIn economics and finance, risk aversion is the tendency of people to prefer outcomes with low uncertainty to those outcomes with high uncertainty, even if the average outcome of the latter is equal to or higher in monetary value than the more certain outcome. Risk aversion explains the inclination to agree to a situation with a more predictable, but possibly lower payoff, rather than another situation with a highly unpredictable, but possibly higher payoff.
Learning to rankLearning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item.
Multi-armed banditIn probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma.
Pattern recognitionPattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
ParadigmIn science and philosophy, a paradigm (ˈpærədaɪm ) is a distinct set of concepts or thought patterns, including theories, research methods, postulates, and standards for what constitute legitimate contributions to a field. The word paradigm is Greek in origin, meaning "pattern", and is used to illustrate similar occurrences. Paradeigma Paradigm comes from Greek παράδειγμα (paradeigma), "pattern, example, sample" from the verb παραδείκνυμι (paradeiknumi), "exhibit, represent, expose" and that from παρά (para), "beside, beyond" and δείκνυμι (deiknumi), "to show, to point out".
Paradigm shiftA paradigm shift is a fundamental change in the basic concepts and experimental practices of a . It is a concept in the philosophy of science that was introduced and brought into the common lexicon by the American physicist and philosopher Thomas Kuhn. Even though Kuhn restricted the use of the term to the natural sciences, the concept of a paradigm shift has also been used in numerous non-scientific contexts to describe a profound change in a fundamental model or perception of events.
Bayes estimatorIn estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation. Suppose an unknown parameter is known to have a prior distribution .
Merit orderThe merit order is a way of ranking available sources of energy, especially electrical generation, based on ascending order of price (which may reflect the order of their short-run marginal costs of production) and sometimes pollution, together with amount of energy that will be generated. In a centralized management, the ranking is so that those with the lowest marginal costs are the first ones to be brought online to meet demand, and the plants with the highest marginal costs are the last to be brought on line.
Scenario optimizationThe scenario approach or scenario optimization approach is a technique for obtaining solutions to robust optimization and chance-constrained optimization problems based on a sample of the constraints. It also relates to inductive reasoning in modeling and decision-making. The technique has existed for decades as a heuristic approach and has more recently been given a systematic theoretical foundation. In optimization, robustness features translate into constraints that are parameterized by the uncertain elements of the problem.