Multi-armed banditIn probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma.
Multiplayer video gameA multiplayer video game is a video game in which more than one person can play in the same game environment at the same time, either locally on the same computing system (couch co-op), on different computing systems via a local area network, or via a wide area network, most commonly the Internet (e.g. World of Warcraft, Call of Duty, DayZ). Multiplayer games usually require players to share a single game system or use networking technology to play together over a greater distance; players may compete against one or more human contestants, work cooperatively with a human partner to achieve a common goal, or supervise other players' activity.
Stochastic gradient descentStochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).
Software agentIn computer science, a software agent or software AI is a computer program that acts for a user or other program in a relationship of agency, which derives from the Latin agere (to do): an agreement to act on one's behalf. Such "action on behalf of" implies the authority to decide which, if any, action is appropriate. Some agents are colloquially known as bots, from robot. They may be embodied, as when execution is paired with a robot body, or as software such as a chatbot executing on a phone (e.g.
Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Sparse dictionary learningSparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed.
MotivationMotivation is the reason for which humans and other animals initiate, continue, or terminate a behavior at a given time. Motivational states are commonly understood as forces acting within the agent that create a disposition to engage in goal-directed behavior. It is often held that different mental states compete with each other and that only the strongest state determines behavior. This means that we can be motivated to do something without actually doing it. The paradigmatic mental state providing motivation is desire.
LearningLearning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. The ability to learn is possessed by humans, animals, and some machines; there is also evidence for some kind of learning in certain plants. Some learning is immediate, induced by a single event (e.g. being burned by a hot stove), but much skill and knowledge accumulate from repeated experiences. The changes induced by learning often last a lifetime, and it is hard to distinguish learned material that seems to be "lost" from that which cannot be retrieved.
Slavic languagesThe Slavic languages, also known as the Slavonic languages, are Indo-European languages spoken primarily by the Slavic peoples and their descendants. They are thought to descend from a proto-language called Proto-Slavic, spoken during the Early Middle Ages, which in turn is thought to have descended from the earlier Proto-Balto-Slavic language, linking the Slavic languages to the Baltic languages in a Balto-Slavic group within the Indo-European family.
Present valueIn economics and finance, present value (PV), also known as present discounted value, is the value of an expected income stream determined as of the date of valuation. The present value is usually less than the future value because money has interest-earning potential, a characteristic referred to as the time value of money, except during times of zero- or negative interest rates, when the present value will be equal or more than the future value. Time value can be described with the simplified phrase, "A dollar today is worth more than a dollar tomorrow".