Abstract:
Different from the past with the state-action as the index, a method of establishing Q-value table by discretizing time was introduced. The problem of selecting an action in a certain state was transformed into the problem of choosing an action at a certain time, which achieved the application of Q learning algorithm in dynamic continuous environment. Firstly a genetic algorithm for global path planning was adopted. Then the obstacle was dynamically avoided through Q-learning. The whole system followed a successive "offline" and "online" multi-layer path planning philosophy. Indicated by the experiment results, a path planning system of mobile robot is achieved, and the proposed methods are state-of-the-art.