To address this problem, counterfactual multi-agent (COMA) ( Foerster et al., 2018), value decomposition networks (VDN) ( Sunehag et al., 2017), and monotonic value function factorization (QMIX) ( Rashid et al., 2020) have been proposed, using the CTDE framework combined with value-based algorithms such as deep Q networks (DQN) ( Mnih et al., 2013), deep recurrent Q networks (DRQN) ( Hausknecht & Stone, 2015), and dueling Q networks ( Wang et al., 2016). When cooperative behavior is to be achieved, representing that there is a cooperative reward that should be maximized by multiple agents, credit should be assigned accordingly to each agent based on its contribution. To address the non-stationarity problem, multi-agent deep deterministic policy gradient (MADDPG) ( Lowe et al., 2017) was proposed using a CTDE framework and the deep deterministic policy gradient (DDPG) actor-critic algorithm for continuous action spaces ( Lillicrap et al., 2015). In other words, each agent selects its action, that is the output of a policy network, without considering the full information of the environment. In the CTDE framework, local observations of agents, global state of the environment, and joint-actions taken by the agents at each time step are available during training to the centralized policy network, while only the local observations of agents are available during execution. When using the MARL, several works have used the centralized training in decentralized execution (CTDE) framework ( Lowe et al., 2017 Sunehag et al., 2017 Foerster et al., 2018 Rashid et al., 2020). To solve these challenges, multi-agent reinforcement learning (MARL) techniques ( Lowe et al., 2017 Sunehag et al., 2017 Foerster et al., 2018 Vinyals et al., 2019 Liu et al., 2019 Samvelyan et al., 2019 Rashid et al., 2020) have been intensively investigated. For multi-agent problems such as multi-robot soccer ( Liu et al., 2019), security ( He, Dai & Ning, 2015 Klima, Tuyls & Oliehoek, 2016), traffic control ( Chu et al., 2019 Zhang et al., 2019), and autonomous driving ( Shalev-Shwartz, Shammah & Shashua, 2016 Sallab et al., 2017), non-stationarity, partial observability, multi-agent training schemes, and heterogeneity can be challenging issues ( Nguyen, Nguyen & Nahavandi, 2020). Promising results have been for cooperative-competitive multi-agent games such as StarCraft ( Vinyals et al., 2019) and Dota ( Berner et al., 2019). Despite the breakthrough results achieved in the field of DRL, deep learning in multi-agent environments that require both cooperation and competition is still challenging. Recently, deep reinforcement learning (DRL) has been widely applied to deterministic games ( Silver et al., 2018), video games ( Mnih et al., 2015 Mnih et al., 2016 Silver et al., 2016), sensor networks ( Kim et al., 2020), and complex robotic tasks ( Andrychowicz et al., 2017 Hwangbo et al., 2019 Seo et al., 2019 Vecchietti et al., 2020 Vecchietti, Seo & Har, 2020). Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. The experiments are performed in a robot soccer environment using Webots robot simulation software. The proposed method is applied to 5 versus 5 AI robot soccer for validation. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. During training, two training processes are conducted in a series. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior however, this method brings limited learning performance with heterogeneous agents. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |