Auto TD

One of our research goals is to develop algorithms that do not require manual tuning of their parameters. We massively use Temporal Difference (TD) learning based algorithms for nexting and learning in Horde. All these algorithms contain a step-size parameter that essentially determines the speed of learning. So far, it has been manually set to a standard value. However, more desirable would be if the right step-size parameter for all different nexting and learning questions could be determined automatically.

Last year we developed a tuning-free step-size adaptation algorithm, Autostep, for supervised learning. This year we have developed a new algorithm, AutoTD, that automatically adapts the step-size parameter of the TD algorithm. The idea behind this algorithm is essentially the same as that of Autostep. We tested the AutoTD algorithm for nexting on robot data. Performance improvement has been observed on most of the nexting questions compared to the conventional TD algorithm. Our future goal is to extend AutoTD to other TD-based algorithms, such as SARSA, Q-learning and GTD learning methods.