At IEEE International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EPIROB 2019) held in Oslo, Norway, my student and I presented following two topics.
“Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation”
“Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework”
The first topic is the collaboration work with ICS, TUM, and it proposed the actor-critic algorithm on the bio-inspired reward-punishment framework, where reward and punishment are explicitly distinguished. Specifically, it proposed the rule to compose the value functions for reward and punishment (and their TD errors), and applied the composed signal to the policy gradient method. As a result, the composed signal includes not only the future rewards and punishments but also immediate reward and punishment explicitly like animals.
The second topic is Paper Abstracts, and proposed a new paradigm for reinforcement learning, namely with the hyperbolic discounting instead of the exponential discounting. Specifically, the hyperbolically-discounted TD error has already been proposed, but it has a possibility to fail discounting. I then combined it with the reward-punishment framework designed in the above paper. As a result, the proposed method can have the asymmetric discounting for reward and punishment according to the design of their functions. Such a feature is related to animals, that means in animals, the discount factor for punishment is basically larger than the one for reward, and it actually improved the learning performance.