YIN Chenkun, JI Hongxuan, ZHANG Yanxin. Autonomous Decision-making of Searching and Rescue Robots Based on Off-policy Hierarchical Reinforcement Learning in a Complex Interactive Environment[J]. Journal of Beijing University of Technology, 2023, 49(4): 403-414. DOI: 10.11936/bjutxb2022090006
    Citation: YIN Chenkun, JI Hongxuan, ZHANG Yanxin. Autonomous Decision-making of Searching and Rescue Robots Based on Off-policy Hierarchical Reinforcement Learning in a Complex Interactive Environment[J]. Journal of Beijing University of Technology, 2023, 49(4): 403-414. DOI: 10.11936/bjutxb2022090006

    Autonomous Decision-making of Searching and Rescue Robots Based on Off-policy Hierarchical Reinforcement Learning in a Complex Interactive Environment

    • The autonomous decision-making of robots in searching and rescue tasks is of great significance for reducing the risk to human rescuers. To make the robot generate decision-making autonomously and path planning reasonably in the face of complex searching and rescue tasks with multi-solution, an off-policy hierarchical reinforcement learning algorithm was designed in this paper. The algorithm consists of two layers of Soft Actor-Critic (SAC) agents, where the higher-level agent can automatically generate goals needed by the lower-level agent and can provide intrinsic reward to guide the lower-level agent to interact with the environment directly. Under the framework of hierarchical reinforcement learning, the robot searching and rescue task in a complex interactive environment was first described as a two-layer structure with a high-level semi-Markov decision process and a low-level Markov decision process. Then different state spaces, action spaces and reward functions at different levels were designed. Next, in view of the problem that the goals and reward functions in traditional reinforcement learning algorithms were needed to design manually, a SAC-based off-policy hierarchical reinforcement learning algorithm was applied to train bipedal mobile robots to interact with the complex environment. The autonomous decision-making of the searching and rescue robots was achieved through efficient use of data and adjustment of goal space. The simulation results verify the effectiveness and generality of the proposed algorithm in solving complex multi-path searching and rescue tasks.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return