Model-based reinforcement learning

Continual Learning

Autonomous robots are desired to learn multiple tasks incrementally over a lifetime. However, learning of (deep) neural networks has a fatal problem, so-called “catastrophic forgetting”, in which the old contents previously learned are easily overwritten by learning new contents, and is not expected to achieve “continual learning”. In this study, to mitigate the catastrophic forgetting, Pseudo modularization of tasks using fractal network structure Regularization to hold the important network parameters and to initialize unnecessary ones have been conducted.

Latent space extraction

Rather than directly determining a policy from high-dimensional observation data, it is better to extract essential information hidden in the observation data and then decide according to it. This is because similar problems can be regarded as the same and the autonomous robots can get the ability to easily adapt to a wide variety of problems. In this research, we are developing new methods based on a variational autoencoder to extract such information (i.

Multi-agent system

Multi-agent systems with autonomous robots are suitable for dealing with large-scale and complex problems. However, many traditional frameworks have a centralized system that requires all the information of the entire system somehow, and are not scalable. In this research, we propose bottom-up multi-agent reinforcement learning in which autonomous robots understand and cooperate with each other through minimal mutual communication in a decentralized manner. Among them, we are working on the following problems, for example.

Online reinforcement learning

Autonomous robots cannot have programs for all the tasks in advance, so the ability to learn and achieve given purposes by trial and error is indispensable. Reinforcement learning, which has been attracting attention in recent years, is a methodology for learning the optimal policy that maximize the sum of rewards obtained through the interaction between an agent (i.e., the autonomous robot) and the environment. Among them, our focus is not on a framework for directly accumulating and replaying experience data, which has become the mainstream in recent years, but on a framework that continues to learn based on online experience.

Reservoir computing

Reservoir Computing (RC) typified by Liquid State Machine (LSM) and Echo State Network (ESN) has been devised as an information processing structure that imitates the cerebellum for motor control. RC is composed of three layers of input layer, reservoir layer that is a kind of recurrent neural network with internal state, and output layer. The most important feature of RC is that the weights to be learned is only the readout connecting the reservoir layer and the output layer.

Robust optimization

When autonomous robots try to learn tasks in real time based on data collected from a real environment, the effects of noise and outliers included in the data cannot be ignored. Especially in reinforcement learning, such effects would be remarkable due to the absence of supervised signals, and various methods for stabilizing the learning have been proposed in recent years. In this research, we propose robust optimization methods against noise and outliers that make learning unstable.