At 26th Robotics Symposia held in online, I and my student presented following topics.
“Reinforcement Learning Algorithm with Feedback and Feedforward Policies”
“Safe and Efficient Imitation Learning by Utilizing Safety Area Implied by Expert Demonstration and Application to Handwriting Robot”
In the first research topic, we focus on the fact that the policy in reinforcement learning corresponds to a feedback controller, and propose a framework for integrating and simultaneously learning a feedforward controller that complements the the shortcoming of the feedback controller, i.e. sensitivity to sensing failures like unexpected delay and loss. Specifically, in solving an optimization problem that aims to be close to an optimal trajectory and be away from a non-optimal trajectory for a predictive trajectory, we derived a new learning law that naturally regularizes between feedback and feedforward controllers by modeling the dynamics part that generates the predictive trajectory on a variational model. It’s my honor to receive the Best Robotics Symposia Award. Thank you very much.
In the second research topic, in order to mitigate the disadvantage of Behavioral Cloning from Observation (BCO), which is a kind of imitation learning and needs to interact with the real environment with non-optimal policy causing a risk of failure, we focus on the fact that the expert’s data is obtained from the safety domain. With this fact, we propose a framework that extracts the latent generation space of the expert’s data using deep learning techniques and resets the interaction based on whether the data obtained by non-optimal policy matches the generation space or not.