A large number of real-world decision-making problems can be described as either a Reinforcement Learning (RL) or a Planning problem. Examples range from control engineering and robotics to operations research, finance, health sciences, and computer games. More often than not, these sequential decision-making problems have large state spaces and to efficiently solve them one must use some kind of function approximation. The appropriate choice of function approximation for a given problem, however, is far from trivial and depends on many factors including the problem itself and the way one interacts with it. The high-level goal of this research, which is conducted by Amir-massoud Farahmand (PhD Candidate) and Csaba Szepesvari (PI), is to introduce flexible and statistically efficient methods that can automatically handle RL/Planning problems with large state spaces.

A principled approach to solve RL/Planning problems with large state spaces is to develop adaptive algorithms that automatically choose the best function approximator for a given problem. These methods reduce the amount of human intervention required to solve a given problem. This automation eventually allows industries and businesses to solve their sequential decision-making problems without the continual help of a reinforcement learning expert.

Adaptive algorithms have two main components (See the figure). One is a flexible algorithm that upon the proper choice of its parameters can work with a large class of function approximators. The second component, which is called a model selection algorithm, uses data to tune the parameters of the flexible algorithm.

Even though adaptive algorithms have been studied in conventional machine learning settings, in which the goal is usually pattern recognition or prediction, few algorithm has been suggested for the sequential decision-making problems. In our work, we develop novel adaptive algorithms for RL/Planning problems.

Our flexible algorithms are based on the idea of regularization. The regularization allows one to start with a large class of possible solutions for a given sequential decision-making problem and to select the simplest solution that explain the data/phenomenon sufficiently well. Our regularization-based algorithms are concrete examples of the Occam’s razor principle tailored to the sequential decision-making problems. The theoretical analyses show that this is a reasonable approach to design flexible algorithms.

Any flexible algorithm has some parameters to be tuned. Our suggested model selection algorithm uses data to predict the quality of a particular set of parameters and chooses the best among them. This new model selection algorithm is also based on the idea of regularization. For more information on regularization in reinforcement learning see www.SoloGen.net.