The Horde Architecture for Learning Knowledge Grounded in Sensorimotor Experience – Reinforcement Learning and Artificial Intelligence

How to learn, represent, and use knowledge of the world in a general sense remains a key open problem in artificial intelligence. There are high-level representation languages based on first-order predicate logic and Bayes networks that are very expressive, but in these languages knowledge is difficult to learn and computationally expensive to use. There are also low-level languages such as differential equations and state-transition matrices that can be learned from data without supervision, but these are much less expressive. Knowledge that is even slightly forward looking, such as ‘If I keep moving, I will bump into something within a few seconds’ cannot be expressed directly with differential equations and may be expensive to compute from them. There remains room for exploring alternate formats for knowledge that are expressive yet learnable from unsupervised sensorimotor data.

In this project we are pursuing a novel approach to knowledge representation based on the notion of value functions and on other ideas and algorithms from reinforcement learning. In our approach, knowledge is represented as a large number of approximate value functions learned in parallel, each with its own policy, pseudo-reward function, pseudo-termination function, and pseudo-terminal-reward function. Our architecture, called Horde, consists of a large set of independent reinforcement learning agents, which we call demons. The approach here is similar to that which we have taken in previous years with TD networks and options. Horde differs from TD networks in its more straightforward handling of state and function approximation (no predictive state representations) and in its use of GTD algorithms for off-policy learning, which are considerably more efficient than those used in prior work with TD networks. We have deployed Horde on the Critterbot to learn a variety of predictions and behaviors off-policy and in real-time.