Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Telecommunications networkA telecommunications network is a group of nodes interconnected by telecommunications links that are used to exchange messages between the nodes. The links may use a variety of technologies based on the methodologies of circuit switching, message switching, or packet switching, to pass messages and signals. Multiple nodes may cooperate to pass the message from an originating node to the destination node, via multiple network hops. For this routing function, each node in the network is assigned a network address for identification and locating it on the network.