This paper presents a modified distributed Q-learning algorithm termed the Sequential Q-learning algorithm with Kalman Filtering (SQKF), for multi-robot decision making. While Q-learning is employed commonly in the multi-robot domain to support robot operation in dynamic and unknown environments, it also faces many challenges. It is questionable to scale the conventional single-agent Q-learning algorithm into the multi-robot domain because such an extension violates the Markov assumption on which the algorithm is based on. The empirical results show that it can confuse the robots and render them unable to learn a good cooperative policy due to incorrect credit assignment among robots and also make a robot incapable of observing the actions of other robots in the same environment. In this paper, a modified Q-learning algorithm termed the Sequential Q-learning Algorithm with Kalman Filtering (SQKF), which is suitable for multi-robot decision-making, is developed. The basic characteristics of the SQKF algorithm are: (1) the learning process is not parallel but sequential, i.e. the robots will not make decisions simultaneously and instead, they will learn and make decisions according to a predefined sequence; (2) a robot will not update its Q values with observed global rewards and instead, it will employ a specific Kalman filter to extract its real local reward from the global reward thereby updating its Q-table with this local reward. The new SQKF algorithm is intended to solve two problems in multi-robot Q-learning: Credit assignment and Behavior conflicts. The detailed procedure of the SQKF algorithm is presented and its application is illustrated. Empirical results show that the algorithm has better performance than the conventional single-agent Q-learning algorithm or the Team Q-learning algorithm in the multi-robot domain.

This content is only available via PDF.
You do not currently have access to this content.