Multimedia  

 

Volume 27 Issue 2 - Publication Date: 1 February 2008
 
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
 
Gen Endo Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan,5 gendo@sms.titech.ac.jp, Jun Morimoto ATR Computational Neuroscience Laboratories Computational Brain Project, ICORP, Japan Science and Technology Agency, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan, Takamitsu Matsubara ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan and Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0192, Japan, Jun Nakanishi ATR Computational Neuroscience Laboratories Computational Brain Project, ICORP Japan Science and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan and Gordon Cheng ATR Computational Neuroscience Laboratories ICORP, Japan Science and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan
 
In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.
 
Return to Contents