At Robomech2019 in Hiroshima, I presented following two contents.
“Hierarchical Reinforcement Learning using Fractal Reservoir Computing for Quadrupedal Robot”
“Reinforcement Learning with Hyperbolic Discounting”
The first presentation is the study by my student who completed last year. It aims to reuse knowledge (skills, tasks, etc.) learned from samples as much as possible. This purpose can simply achieve by combining continual learning and hierarchical learning. That is, a simple curriculum, where agents learn hierarchical tasks from lower ones to upper ones in order, succeeded in storing the knowledge by continual learning, and enabled the robot to learn complicated tasks that are difficult to learn by end-to-end learning.
The second presentation is to propose a new definition of return in reinforcement learning. In general, the return is defined with exponential discounting, which is not suitable for human decision-making model, so I redefined it with hyperbolic discounting, which is often employed as human decision-making model. It can be derived without problems under reward-punishment framework, which explicitly has reward (positive evaluation) and punishment (negative evaluation). It suggested that the design of reward and punishment functions would be important to get better learning performance.