Online reinforcement learning

Autonomous robots cannot have programs for all the tasks in advance, so the ability to learn and achieve given purposes by trial and error is indispensable. Reinforcement learning, which has been attracting attention in recent years, is a methodology for learning the optimal policy that maximize the sum of rewards obtained through the interaction between an agent (i.e., the autonomous robot) and the environment. Among them, our focus is not on a framework for directly accumulating and replaying experience data, which has become the mainstream in recent years, but on a framework that continues to learn based on online experience. In this research, we are developing biologically-inspired methods to further improve the performance of such online reinforcement learning.

For example,

Design of stochastic policy based on the search model of animals
Learning method in a framework that distinguishes between reward and punishment
Learning method based on bio-inspired reward discount model
Adaptive eligibility traces for online deep reinforcement learning

We are also conducting applied research, such as non-grasping object manipulation.