At 2021 IEEE International Conference on Robotics and Automation (ICRA) held in China + Online, I presented the following topic.
「Proximal Policy Optimization with Relative Pearson Divergence」
This, PPO-RPE, is a modification of Proximal Policy Optimization (PPO), one of the latest reinforcement learning methods. PPO-RPE replaces the heuristic density ratio clipping operation with a regularization of the relative Pearson (RPE) divergence, and enables a symmetric regularization from the mean of the density ratio by specifying the relative degree appropriately. The theoretical derivation is rather clean, and the practicality is at least as good as PPO.