基于随机集成网络-TD3的四足机器人步态学习方法

    Gait Learning Method for Quadruped Robots Based on Randomized Ensembled Network-TD3

    • 摘要: 为解决四足机器人技能学习领域中双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient, TD3)算法中存在Q值低估导致价值估计不准确, 从而出现学习效果恶化的问题, 提出一种随机集成网络-TD3(randomized ensembled network-TD3, RE-TD3)算法。首先, 该算法集成多个Q值网络, 并随机选取Q值网络进行评估, 缓解价值估计不准确的问题, 有效提高策略性能; 其次, 设计合适的奖励函数以正确引导四足机器人的步态学习任务; 最后, 设置仿真实验进行验证。实验结果表明, 该算法能够使四足机器人学习到良好的运动步态, 与TD3算法相比, 奖励值提高了32%, 机体稳定性提高了约67%, 期望方向偏离量提高了60%。

       

      Abstract: To solve the problem of inaccurate value estimation caused by underestimation of Q-values in the twin delayed deep deterministic policy gradient (TD3) algorithm in the field of quadruped robot skill learning, which leads to deteriorating learning performance, a randomized ensembled network-TD3 (RE-TD3) algorithm is proposed. First, this algorithm resembled multiple Q-value networks and randomly selected Q-value networks for evaluation, alleviating the problem of inaccurate value estimation and effectively improving policy performance. Second, appropriate reward functions were designed to correctly guide the gait learning task of quadruped robots. Finally, simulation experiments were conducted to validate the effectiveness of the proposed algorithm. Results show that the quadruped robot can learn good gaits by the RE-TD3 algorithm, and compared with TD3 algorithm, reward value increases by 32%, body stability increases by approximately 67%, and expected direction offset increases by 60%.

       

    /

    返回文章
    返回