司鹏搏, 吴兵, 杨睿哲, 李萌, 孙艳华. 基于多智能体深度强化学习的无人机路径规划[J]. 北京工业大学学报, 2023, 49(4): 449-458. DOI: 10.11936/bjutxb2022080007
    引用本文: 司鹏搏, 吴兵, 杨睿哲, 李萌, 孙艳华. 基于多智能体深度强化学习的无人机路径规划[J]. 北京工业大学学报, 2023, 49(4): 449-458. DOI: 10.11936/bjutxb2022080007
    SI Pengbo, WU Bing, YANG Ruizhe, LI Meng, SUN Yanhua. UAV Path Planning Based on Multi-agent Deep Reinforcement Learning[J]. Journal of Beijing University of Technology, 2023, 49(4): 449-458. DOI: 10.11936/bjutxb2022080007
    Citation: SI Pengbo, WU Bing, YANG Ruizhe, LI Meng, SUN Yanhua. UAV Path Planning Based on Multi-agent Deep Reinforcement Learning[J]. Journal of Beijing University of Technology, 2023, 49(4): 449-458. DOI: 10.11936/bjutxb2022080007

    基于多智能体深度强化学习的无人机路径规划

    UAV Path Planning Based on Multi-agent Deep Reinforcement Learning

    • 摘要: 为解决多无人机(unmanned aerial vehicle,UAV)在复杂环境下的路径规划问题,提出一个多智能体深度强化学习UAV路径规划框架. 该框架首先将路径规划问题建模为部分可观测马尔可夫过程,采用近端策略优化算法将其扩展至多智能体,通过设计UAV的状态观测空间、动作空间及奖赏函数等实现多UAV无障碍路径规划;其次,为适应UAV搭载的有限计算资源条件,进一步提出基于网络剪枝的多智能体近端策略优化(network pruning-based multi-agent proximal policy optimization, NP-MAPPO)算法,提高了训练效率. 仿真结果验证了提出的多UAV路径规划框架在各参数配置下的有效性及NP-MAPPO算法在训练时间上的优越性.

       

      Abstract: To solve the path planning problem of multi-unmanned aerial vehicle (UAV) in complex environment, a multi-agent deep reinforcement learning UAV path planning framework was proposed. First, the path planning problem was modeled as a partially observable Markov decision process, and then, it was extended to multi-agent by using the proximal strategy optimization algorithm. Specifically, the multi-UAV barrier-free path planning was achieved by designing the UAV's state observation space, action space and reward function. Moreover, to adapt to the limited computing resource conditions of UAVs, a network pruning-based multi-agent proximal policy optimization (NP-MAPPO) algorithm was proposed, which improved the training efficiency. Simulations verify the effectiveness of the proposed multi-UAV path planning framework under various parameter configurations and the superiority of NP-MAPPO algorithm in training time.

       

    /

    返回文章
    返回