presented at IROS2022

At 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) held in Kyoto, Japan, I presented the following topic.

「L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning」

The proposed method, L2C2, is a method to improve the smoothness of value functions and policy functions, which require function approximation in reinforcement learning, by regularization considering Lipschitz continuity, in contrast to deep reinforcement learning, which has become mainstream in recent years and tends to cause excessive output changes in response to noise in state inputs, resulting in overlearning and other problems. The key of L2C2 is that Lipschitz continuity is handled in the local space defined based on state transitions, since handling global Lipschitz continuity may impair the expressiveness of functions. This approach succeeded in smoothing the functions while preserving their expressiveness, thereby stabilizing the learning process and suppressing fluctuations in the robot actions. This presentation received the SICE International Young Authors Award (SIYA-IROS2022).

arXiv

In addition, the following poster was presented by my student as Late Breaking Results.

「Noise-Aware Stochastic Gradient Optimization with AdaTerm」

AdaTerm is a type of stochastic gradient descent method, which is a fundamental technology for updating networks in deep learning. We have reinterpreted the general stochastic gradient descent method mathematically and derived a mechanism to automatically detect noise in the gradients and exclude it from the updates. We are currently verifying the effectiveness of our method in various tasks with deep learning, and we report some of the results here.

arXiv