General Atari 2600 Game Playing

Although the artificial intelligence community has achieved superb performance in narrow domains such as Chess or Backgammon, developing agents that can perform well across a wide-variety of tasks has remained challenging. One of the main reasons for this difficulty is the lack of an appropriate and widely agreed upon performance metric for measuring general agent ability. This project uses the large number of Atari 2600 video games as a platform for this purpose. The Atari 2600 was the dominant video game console during the late 1970s and early 1980s. With the prevalence of high quality open-source Atari emulators, as well as 25 years of Moore’s Law, it is now feasible to use the set of Atari games as an AI testbed. With so many different games, they can be split into training and test sets. Because each Atari game was designed to be interesting to humans, there is little risk of employing techniques that are so general as to be vacuous and, as any technique needs to work across a space of Atari games, there is minimal risk of overfitting to the specific aspects of any particular game. Any technique that improves the performance of an agent across a large set of Atari games is more likely to be relevant to the broader goals of Artificial Intelligence.

Donkey Kong

We focused on methods that can play arbitrary Atari 2600 console games without game-specific assumptions or prior knowledge. Two main approaches were considered: reinforcement-learning methods and search methods. The reinforcement-learning methods used feature vectors generated from the game screen as well as the console RAM to learn to play a given game. The search-based methods used the emulator to simulate the consequence of actions into the future, aiming to play as well as possible by only exploring a very small fraction of the state-space. To insure the generic nature of our methods, all agents were designed and tuned using four specific games. Once the development and parameter selection was complete, the performance of the agents was evaluated on a set of 50 randomly selected games. Significant learning was found for the reinforcement-learning methods on most games. Additionally, some instances of human- level performance was achieved by the search-based methods.