Normal view MARC view ISBD view

Reinforcement Learning: Exploration - Exploitation Dilema in Multi Agent Foraging Task

By:

Yogeswaran, Mohan

Contributor(s):

Exploration - Exploitation Dilema

Material type: Article

ArticlePublication details: Description: 49 (3) Jul-Sep 2012, 223-236pSubject(s):

In: BV- Opsearch (Jan - Dec 2012)Summary: The exploration–exploitation dilemma has been an unresolved issue within the framework of multi-agent reinforcement learning. The agents have to explore in order to improve the state which potentially yields higher rewards in the future or exploit the state that yields the highest reward based on the existing knowledge. Pure exploration degrades the agent’s learning but increases the flexibility of the agent to adapt in a dynamic environment. On the other hand pure exploitation drives the agent’s learning process to locally optimal solutions. Various learning policies have been studied to address this issue. This paper presents critical experimental results on a number of learning policies reported in the open literatures. Learning policies namely greedy, ξ-greedy, Boltzmann Distribution (BD), Simulated Annealing (SA), Probability Matching (PM) and Optimistic Initial Values (OIV) are implemented to study on their performances on a multi-agent foraging-task modelled. Based on the numerical results that were obtained, the performances of the learning policies are discussed.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Articles	Main Library		Available		AR16014

The exploration–exploitation dilemma has been an unresolved issue within the framework of multi-agent reinforcement learning. The agents have to explore in order to improve the state which potentially yields higher rewards in the future or exploit the state that yields the highest reward based on the existing knowledge. Pure exploration degrades the agent’s learning but increases the flexibility of the agent to adapt in a dynamic environment. On the other hand pure exploitation drives the agent’s learning process to locally optimal solutions. Various learning policies have been studied to address this issue. This paper presents critical experimental results on a number of learning policies reported in the open literatures. Learning policies namely greedy, ξ-greedy, Boltzmann Distribution (BD), Simulated Annealing (SA), Probability Matching (PM) and Optimistic Initial Values (OIV) are implemented to study on their performances on a multi-agent foraging-task modelled. Based on the numerical results that were obtained, the performances of the learning policies are discussed.

There are no comments on this title.

to post a comment.