Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

栏目: IT技术 · 发布时间: 4年前

内容简介:Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human br

The new framework provides an OpenAI-like environment for language-based games.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human brain such as commonsense reasoning, deduction or inference are regularly expressed via language. It is hard to imagine any form of advanced artificial intelligence(AI) that wouldn’t rely on language to express its interactions with a given environment. In recent years, reinforcement learning have shown some promise helping AI agents learn their own languages for communication. To facilitate rapid development of language-based agents, Microsoft Research has open sourced Jericho , an learning environment that leverages language games to train reinforcement learning agents.

The idea of using language games to develop knowledge makes intuitive sense. Just like babies learn to develop language by regularly interactive with objects, we can build AI environment with the right incentives to develop communications in reinforcement learning agents. From the different game theory models in the language learning space, Jericho relies on a recent discipline known as interactive fiction(IF) games.

IF Games

In computer science, IF games are defined as are software environments in which players observe textual descriptions of the simulated world, issue text actions, and receive score as they progress through the story. From that perspective, IF games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. IF games like Zork I have achieved incredible levels of popularity. To illustrate how IF games can be applied to train AI agents, consider the game illustrated in the following figure: a reinforcement learning agent could learn to interact with an environment issuing language commands and receiving textual description of the new state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Like many natural language processing(NLP) tasks, IF games require natural language understanding, but unlike most NLP tasks, IF games are sequential decision making problems in which actions change the subsequent world states of the game and choices made early in a game may have long term effects on the eventual endings. Furthermore, IF games come with their own set of natural language learning challenges:

· Combinatorial Action Space: More reinforcement learning models have been designed to operate on either discrete or continuous spaces. However, IF games require the agent to operate in the combinatorial action space of natural language. For example, an agent generating a four-word sentence from a modest vocabulary of size 700, is effectively exploring a space of 7004 = 240 billion possible actions.

· Commonsense Reasoning: Due to the lack of graphics, IF games rely on the player’s commonsense knowledge as a prior for how to interact with the game world. For example, a human player encountering a locked chest intuitively understands that the chest needs to be unlocked with some type of key, and once unlocked, the chest can be opened and will probably contain useful items.

· Knowledge Representation: IF games span many distinct locations, each with unique descriptions, objects, and characters. Players move between locations by issuing navigational commands like go west. because connectivity between locations is not necessarily Euclidean, agents need to detect when a navigational action has succeeded or failed and whether the location reached was previously seen or new. Beyond location connectivity, it’s also helpful to keep track of the objects present at each location, with the understanding that objects can be nested inside of other objects, such as food in a refrigerator or a sword in a chest.

These challenges need to be addressed in order to make IF games a viable mechanism for training reinforcement learning agents.

Enter Microsoft Jericho

Jericho is a Python-based learning environment based on IF games. You can think about Jericho as the OpenAI Gym of language learning. Jericho is optimized for reinforcement learning models and provides capabilities such as game state persistence that could help enable capabilities such as memory in reinforcement learning agents. In order to make IF games more accessible and address some of the challenges mentioned in the previous section, Jericho includes the following features:

· World-Object-Tree Representation: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.

· Fixed Random Seed to Enforce Determinism : By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like Go-Explore, which systematically build and expand a library of the visited states.

· Load/save Functionality: This feature enables restoration of previous game states, enabling the use of planning algorithms like Monte-Carlo tree search.

· World-Change Detection and Valid-Action Identification: This feature provides feedback on the success or failure of an agent’s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify valid actions, those that lead to changes in the game state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

The current version of Jericho includes two learning agents, Template-DQN (TDQN) and deep reinforcement relevance network (DRRN). TDQN models are typically more effective in parser-based games that handles the combinatorial action space by generating verb-object actions from a pre-defined set verbs and objects. DRRN models are better applied on choice-based games that present a series of choices at every state of the game. Jericho provides a common encoder to have a consistent input representation for both models. While both agents utilize common input representation, they differ in the methods of action selection. DRRN uses Jericho’s valid-action identification to estimate a Q-value for each of the valid actions a . It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

In the previous diagram, we can see factors such as Onar, Oinv and Odesc as the elements to encode the textual input. After each command, Jericho agents use a common input representation that includes current textual observation Onar, inventory text Oinv, and current location description Odesc (as given by a look command). If we use the command “open window” in the popular text-based game Zork I, Jericho will generate the following representation.

· Onar: With great effort, you open the window far enough to allow entry.

· Oinv: You are empty-handed.

· Odesc: You are behind the white house. A path leads into the forest to the east. In one corner of the house there is a small window which is slightly ajar.

Microsoft evaluated the TDQN and DRRN across a diverse set of 32 games, including Zork I. The results showed that the Jericho agents accumulated higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills. However, none the agents even came close to human-level performance.

Jericho is an important step towards making language a central part of reinforcement learning models. Some of the initial experiments showed that humans cognitive skills like common sense and deduction remain a high barrier for AI agents. However, Jericho showed that language-based training could be the key to unlock those capabilities.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

如何变得有思想

如何变得有思想

阮一峰 / 人民邮电出版社 / 2014-12-2 / 49.00元

本书为阮一峰博客选集,囊括了作者对各种问题的思考,围绕的主题是试图理解这个世界。本书内容非常广泛,涉及观点、文学、历史、科技、影视等方面。作者在书中对具有深刻意义的文字进行摘录,并且在思索后提出自己独特的观点。书后附有阮一峰诗集。 本书适合喜欢独立思考、热爱读书的读者,对于广大读者具有一定的启发作用。一起来看看 《如何变得有思想》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码