At RSJ2017 in SAITAMA, I presented a following content.
“Actor-Critic Reinforcement Learning for Acquiring Global Optimum”
In this presentation, I solved the problem of the conventional actor-critic algorithm in reinforcement learning, namely it would fall into local optimum. Specifically, I employed student-t distribution as a stochastic policy, instead of normal distribution, because it has efficient exploration ability and conservative learning ability.