Reinforcement learning (RL) is a common approach to training independent agents who can learn to perform complex tasks by interacting with their environment. RL enables them to learn the best procedure in different circumstances and to adapt to their environment using a reward system.
A major challenge in RL is how to efficiently explore the vast expanse of a country for many real-world problems. This challenge arises due to the fact that agents in RL learn by interacting with their environment through exploration. Think of an agent trying to play Minecraft. If you’ve heard of it before, you know how complex the Minecraft crafting tree can be. You have hundreds of craftable things, and you might need to craft one to craft another, etc. So it’s a really complicated environment.
Since the environment can contain a large number of potential states and actions, it can become difficult for an agent to find the optimal policy through random exploration alone. The agent must balance exploiting the current best policy with exploring new parts of the state space to find a possible better policy. Finding efficient exploration methods that can balance exploration with exploitation is an active area of research in RL.
🚀 Join the fastest ML Subreddit community
It is well known that practical decision-making systems need to use prior knowledge about a task efficiently. By having advance information about the task itself, the agent can better adapt his policy and can avoid falling into suboptimal policies. However, most reinforcement learning methods are currently trained without any prior training or external knowledge.
But why is this the case? In recent years, there has been increasing interest in using large language models (LLMs) to assist RL agents in exploration by providing external knowledge. This approach has shown promise, but there are still many challenges to overcome, such as grounding knowledge of LLM in the environment and dealing with the accuracy of LLM output.
So, should we abandon the use of LLM to help RL agents? If not, how can we fix these problems and then use them again to direct RL agents? The answer has a name, which is it DECKARD.
DECKARD Trained on Minecraft, crafting a specific item in Minecraft can be a challenging task if one lacks expert knowledge of the game. This has been proven by studies showing that achieving an objective in Minecraft can be easier through the use of heavy rewards or expert offers. As a result, crafting items in Minecraft has become an ongoing AI challenge.
DECKARD It uses a few-shot motivating technique on a large language model (LLM) to create an abstract global model (AWM) for sub-objectives. LLM is used for the premise of AWM, which means that dreams About the task and steps to solve it. Then he wakes up and learns a standard policy for the subgoals he generates during the dream. Since this is done in the real environment, DECKARD can check the supposed AWM. AWM is debugged during the wake-up phase, and detected nodes are marked as checked for further use in the future.
Experiments show us that LLM guidance is essential for exploration in DECKARD, with the agent version without LLM guidance taking twice as long to craft most items during open exploration. When exploring a particular mission, DECKARD It improves sample efficiency by orders of magnitude compared to similar agents, demonstrating that LLM can be strongly applied to RL.
scan the research paper, code, And project. Don’t forget to join 20k+ML Sub RedditAnd discord channelAnd Email newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we’ve missed anything, feel free to email us at Asif@marktechpost.com
🚀 Check out 100’s AI Tools in the AI Tools Club
Ekrem Cetinkaya has a Bachelor’s degree. in 2018 and MA. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his master’s degree. Thesis on image noise reduction using deep convolutional networks. He is currently pursuing his Ph.D. degree at the University of Klagenfurt, Austria, and works as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networks.