Accession Number ADA585093
Title Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs.
Publication Date 2012
Media Count 11p
Personal Author B. Banerjee L. Kraemer
Abstract In many multi-agent tasks, agents face uncertainty about the environment, the outcomes of their actions, and the behaviors of other agents. Dec-POMDPs offer a powerful modeling framework for sequential, cooperative, multiagent tasks under uncertainty. Solution techniques for infinite-horizon Dec-POMDPs have assumed prior knowledge of the model and have required centralized solvers. We propose a method for learning Dec-POMDP solutions in a distributed fashion. We identify the issue of policy synchronization that distributed learners face and propose incorporating rewards into their learned model representations to ameliorate it. Most importantly, we show that even if rewards are not visible to agents during policy execution, exploiting the information contained in reward signals during learning is still beneficial.
Keywords Decision making
Decpomdps(Decentralized partially observable markov decision
Learning
Markov processes
Multi-agent learning
Multiagent systems
Policies
Reinforcement learning


 
Source Agency Non Paid ADAS
NTIS Subject Category 92B - Psychology
72F - Statistical Analysis
Corporate Author University of Southern Mississippi, Hattiesburg. School of Computing.
Document Type Technical report
Title Note Technical rept.
NTIS Issue Number 1403
Contract Number W911NF-11-1-0124

Science and Technology Highlights

See a sampling of the latest scientific, technical and engineering information from NTIS in the NTIS Technical Reports Newsletter

Acrobat Reader Mobile    Acrobat Reader