The NTIS website and supporting ordering systems are undergoing a major upgrade from 8PM on September 25th through approximately October 6. During that time, much of the functionality, including subscription and product ordering, shipping, etc., will not be available. You may call NTIS at 1-800-553-6847 or (703) 605-6000 to place an order but you should expect delayed shipment. Please do NOT include credit card numbers in any email you might send NTIS.
Documents in the NTIS Technical Reports collection are the results of federally funded research. They are directly submitted to or collected by NTIS from Federal agencies for permanent accessibility to industry, academia and the public.  Before purchasing from NTIS, you may want to check for free access from (1) the issuing organization's website; (2) the U.S. Government Printing Office's Federal Digital System website http://www.gpo.gov/fdsys; (3) the federal government Internet portal USA.gov; or (4) a web search conducted using a commercial search engine such as http://www.google.com.
Accession Number ADA585093
Title Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs.
Publication Date 2012
Media Count 11p
Personal Author B. Banerjee L. Kraemer
Abstract In many multi-agent tasks, agents face uncertainty about the environment, the outcomes of their actions, and the behaviors of other agents. Dec-POMDPs offer a powerful modeling framework for sequential, cooperative, multiagent tasks under uncertainty. Solution techniques for infinite-horizon Dec-POMDPs have assumed prior knowledge of the model and have required centralized solvers. We propose a method for learning Dec-POMDP solutions in a distributed fashion. We identify the issue of policy synchronization that distributed learners face and propose incorporating rewards into their learned model representations to ameliorate it. Most importantly, we show that even if rewards are not visible to agents during policy execution, exploiting the information contained in reward signals during learning is still beneficial.
Keywords Decision making
Decpomdps(Decentralized partially observable markov decision
Learning
Markov processes
Multi-agent learning
Multiagent systems
Policies
Reinforcement learning


 
Source Agency Non Paid ADAS
NTIS Subject Category 92B - Psychology
72F - Statistical Analysis
Corporate Author University of Southern Mississippi, Hattiesburg. School of Computing.
Document Type Technical report
Title Note Technical rept.
NTIS Issue Number 1403
Contract Number W911NF-11-1-0124

Science and Technology Highlights

See a sampling of the latest scientific, technical and engineering information from NTIS in the NTIS Technical Reports Newsletter

Acrobat Reader Mobile    Acrobat Reader