Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

栏目: IT技术 · 发布时间: 6年前

内容简介:Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human br

The new framework provides an OpenAI-like environment for language-based games.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human brain such as commonsense reasoning, deduction or inference are regularly expressed via language. It is hard to imagine any form of advanced artificial intelligence(AI) that wouldn’t rely on language to express its interactions with a given environment. In recent years, reinforcement learning have shown some promise helping AI agents learn their own languages for communication. To facilitate rapid development of language-based agents, Microsoft Research has open sourced Jericho , an learning environment that leverages language games to train reinforcement learning agents.

The idea of using language games to develop knowledge makes intuitive sense. Just like babies learn to develop language by regularly interactive with objects, we can build AI environment with the right incentives to develop communications in reinforcement learning agents. From the different game theory models in the language learning space, Jericho relies on a recent discipline known as interactive fiction(IF) games.

IF Games

In computer science, IF games are defined as are software environments in which players observe textual descriptions of the simulated world, issue text actions, and receive score as they progress through the story. From that perspective, IF games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. IF games like Zork I have achieved incredible levels of popularity. To illustrate how IF games can be applied to train AI agents, consider the game illustrated in the following figure: a reinforcement learning agent could learn to interact with an environment issuing language commands and receiving textual description of the new state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Like many natural language processing(NLP) tasks, IF games require natural language understanding, but unlike most NLP tasks, IF games are sequential decision making problems in which actions change the subsequent world states of the game and choices made early in a game may have long term effects on the eventual endings. Furthermore, IF games come with their own set of natural language learning challenges:

· Combinatorial Action Space: More reinforcement learning models have been designed to operate on either discrete or continuous spaces. However, IF games require the agent to operate in the combinatorial action space of natural language. For example, an agent generating a four-word sentence from a modest vocabulary of size 700, is effectively exploring a space of 7004 = 240 billion possible actions.

· Commonsense Reasoning: Due to the lack of graphics, IF games rely on the player’s commonsense knowledge as a prior for how to interact with the game world. For example, a human player encountering a locked chest intuitively understands that the chest needs to be unlocked with some type of key, and once unlocked, the chest can be opened and will probably contain useful items.

· Knowledge Representation: IF games span many distinct locations, each with unique descriptions, objects, and characters. Players move between locations by issuing navigational commands like go west. because connectivity between locations is not necessarily Euclidean, agents need to detect when a navigational action has succeeded or failed and whether the location reached was previously seen or new. Beyond location connectivity, it’s also helpful to keep track of the objects present at each location, with the understanding that objects can be nested inside of other objects, such as food in a refrigerator or a sword in a chest.

These challenges need to be addressed in order to make IF games a viable mechanism for training reinforcement learning agents.

Enter Microsoft Jericho

Jericho is a Python-based learning environment based on IF games. You can think about Jericho as the OpenAI Gym of language learning. Jericho is optimized for reinforcement learning models and provides capabilities such as game state persistence that could help enable capabilities such as memory in reinforcement learning agents. In order to make IF games more accessible and address some of the challenges mentioned in the previous section, Jericho includes the following features:

· World-Object-Tree Representation: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.

· Fixed Random Seed to Enforce Determinism : By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like Go-Explore, which systematically build and expand a library of the visited states.

· Load/save Functionality: This feature enables restoration of previous game states, enabling the use of planning algorithms like Monte-Carlo tree search.

· World-Change Detection and Valid-Action Identification: This feature provides feedback on the success or failure of an agent’s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify valid actions, those that lead to changes in the game state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

The current version of Jericho includes two learning agents, Template-DQN (TDQN) and deep reinforcement relevance network (DRRN). TDQN models are typically more effective in parser-based games that handles the combinatorial action space by generating verb-object actions from a pre-defined set verbs and objects. DRRN models are better applied on choice-based games that present a series of choices at every state of the game. Jericho provides a common encoder to have a consistent input representation for both models. While both agents utilize common input representation, they differ in the methods of action selection. DRRN uses Jericho’s valid-action identification to estimate a Q-value for each of the valid actions a . It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

In the previous diagram, we can see factors such as Onar, Oinv and Odesc as the elements to encode the textual input. After each command, Jericho agents use a common input representation that includes current textual observation Onar, inventory text Oinv, and current location description Odesc (as given by a look command). If we use the command “open window” in the popular text-based game Zork I, Jericho will generate the following representation.

· Onar: With great effort, you open the window far enough to allow entry.

· Oinv: You are empty-handed.

· Odesc: You are behind the white house. A path leads into the forest to the east. In one corner of the house there is a small window which is slightly ajar.

Microsoft evaluated the TDQN and DRRN across a diverse set of 32 games, including Zork I. The results showed that the Jericho agents accumulated higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills. However, none the agents even came close to human-level performance.

Jericho is an important step towards making language a central part of reinforcement learning models. Some of the initial experiments showed that humans cognitive skills like common sense and deduction remain a high barrier for AI agents. However, Jericho showed that language-based training could be the key to unlock those capabilities.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

游戏化革命:未来商业模式的驱动力

游戏化革命:未来商业模式的驱动力

[美]盖布·兹彻曼、[美]乔斯琳·林德 / 应皓 / 中国人民大学出版社有限公司 / 2014-8-1 / CNY 59.00

第一本植入游戏化理念、实现APP互动的游戏化商业图书 游戏化与商业的大融合、游戏化驱动未来商业革命的权威之作 作者被公认为“游戏界的天才”,具有很高的知名度 亚马逊五星级图书 本书观点新颖,游戏化正成为最热门的商业新策略 游戏化是当今最热门的商业新策略,它能帮助龙头企业创造出前所未有的客户和员工的参与度。商业游戏化策略通过利用从游戏设计、忠诚度计划和行为经济学中所汲取......一起来看看 《游戏化革命:未来商业模式的驱动力》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试