This thesis delves into the recent developments of reinforcement learning methods, with a particular focus on industrial applications.
The first part of the thesis aims to review the general framework that stands behind reinforcement learning theory, starting from the definition of the agent- environment interaction. The agent is the decision-maker part that acts on the environment which responds with an observation of the current state and a feedback signal, called reward or reinforce. The objective of the agent is to maximize the discounted cumulative reward.
Overall, the agent-environment interaction is an objective function maximization problem, and this allows us to reformulate real-world applications as problems that the agent can face.
The agent-environment interaction is led back to the Markov Decision Process formalism, which allows the elements needed for learning to be treated within a mathematical model.
Then, the value function concept is introduced to let the agent evaluate its actions and choose the best action with respect to its objective. The whole learning process consists of the agent that learns a policy, and iteratively improves it; this improvement is intended to be a maximization procedure of the value function.
The thesis proceeds with the exposure of some of the main methods that are used to compute the value function. Such methods descend from dynamic programming, more particularly from Bellman’s recursive equations. The recursive relation is the basis of some of the iterative algorithms that are used to approximate the value function itself.
A model-free algorithm class is derived from the firstly proposed model-based methods, and specifically, the Q-learning algorithm is presented. Then the application of Q-learning is extended with the introduction of function approximators, more particularly neural networks or tree-based regressors, that can deal with high- dimensional state and action spaces often used to model real-world problems.
In particular recent papers have integrated Q-learning with deep neural networks. These have recently brought to new algorithms that compose the class of deep reinforcement learning algorithms.
These new algorithms are studied from a theoretical point of view and then applied as solutions to real-world problems. In this work, mainly two applications are investigated and experimented — the first consists of the development of a thermostat control algorithm for a real HVAC system.
The algorithm studied make use of the extra-tree regression algorithm for the prediction of the value function in a Q- learning batch algorithm called fitted Q iteration. Secondly, we studied the solution of the lane-keeping for autonomous driving tasks through deep reinforcement learning. In particular, this task is solved in the simulated environment named TORCS for car driving
simulation. Several experiments are conducted to achieve robust results and evaluate convergence properties of these algorithms in different scenarios.
Finally, the characteristics, capabilities, and development possibilities are discussed. Since reinforcement learning methods are proposed to be an alternative solution to classical control theory methods, the characteristics of this approach are worth to note with regard to adaptability to dynamic and unknown environments, the integration with deep neural networks for end-to-end learning and the adoption of higher abstraction levels of programming.