Nexting on a Reinforcement-Learning Robot – Reinforcement Learning and Artificial Intelligence

The term “nexting” has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. When we hear a melody we predict what the next note will be or when the next downbeat will occur, and are surprised and interested when our predictions are disconfirmed. When we read a sentence we guess what the next word will be, or how the sentence will end. When we see a bird in flight, hear our own footsteps, or handle an object, we continually make and confirm multiple predictions about our sensory input. In all these examples, we continually predict what will happen to us next. When nexting, an individual may be predicting many or all of their sensory inputs, and at multiple time scales. When we read, for example, it seems likely that we next at the letter, word, and sentence levels, each involving a substantially different time scale. Nexting can be seen as the most basic kind of prediction, preceding and possibly underlying all the others. That people and a wide variety of animals learn and make simple predictions at a range of short time scales was established so long ago in psychology that it is known as “classical conditioning.” Animals seem wired to learn the predictive relationships of their world.

To be able to next is to have a basic kind of knowledge about how the world works in interaction with your body. To be able to learn to next—to notice any disconfirmed predictions and continually adjust your nexting—is to be aware of your world in a significant way. To build a robot that can do both of these things is a natural goal, which we have pursued. Prior attempts to do this can be grouped in two approaches. The first approach is to build a very short-term model of the world’s dynamics, either in terms of differential equations or state-transition probabilities. This approach usually ends up being very different from nexting. The second approach, which we pursued on the Critterbot, is to use TD methods to learn long-term predictions directly. The prior work pursuing this approach has almost all been in simulation, and has used table-lookup representations and a small number of predictions. Our work is the first to demonstrate real-time nexting on a physical robot. We showed that thousands of anticipatory predictions at various time-scales can be learned in parallel on a physical robot in real-time using a reinforcement learning methodology. We used a large feature representation and a standard TD learning algorithm to make real-time predictions about the short-term future of the robot’s sensor readings using a consumer laptop. An example is shown in below. The learning was entirely on-policy and used conventional TD algorithms. In future work we will extend it to off-policy learning and GTD algorithms.

“Nexting” in a reinforcement-learning robot. The black line shows readings from a light sensor peaking every 30 seconds as the robot passes a bright light once on each trip around its pen. The red line shows a learned prediction, based on other sensor readings, rising several seconds in advance of the actual light.