Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

栏目: IT技术 · 发布时间: 5年前

内容简介：Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human br

The new framework provides an OpenAI-like environment for language-based games.

Jesus Rodriguez

Jan 22 ·6min read

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human brain such as commonsense reasoning, deduction or inference are regularly expressed via language. It is hard to imagine any form of advanced artificial intelligence(AI) that wouldn’t rely on language to express its interactions with a given environment. In recent years, reinforcement learning have shown some promise helping AI agents learn their own languages for communication. To facilitate rapid development of language-based agents, Microsoft Research has open sourced Jericho , an learning environment that leverages language games to train reinforcement learning agents.

The idea of using language games to develop knowledge makes intuitive sense. Just like babies learn to develop language by regularly interactive with objects, we can build AI environment with the right incentives to develop communications in reinforcement learning agents. From the different game theory models in the language learning space, Jericho relies on a recent discipline known as interactive fiction(IF) games.

IF Games

In computer science, IF games are defined as are software environments in which players observe textual descriptions of the simulated world, issue text actions, and receive score as they progress through the story. From that perspective, IF games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. IF games like Zork I have achieved incredible levels of popularity. To illustrate how IF games can be applied to train AI agents, consider the game illustrated in the following figure: a reinforcement learning agent could learn to interact with an environment issuing language commands and receiving textual description of the new state.

Like many natural language processing(NLP) tasks, IF games require natural language understanding, but unlike most NLP tasks, IF games are sequential decision making problems in which actions change the subsequent world states of the game and choices made early in a game may have long term effects on the eventual endings. Furthermore, IF games come with their own set of natural language learning challenges:

· Combinatorial Action Space: More reinforcement learning models have been designed to operate on either discrete or continuous spaces. However, IF games require the agent to operate in the combinatorial action space of natural language. For example, an agent generating a four-word sentence from a modest vocabulary of size 700, is effectively exploring a space of 7004 = 240 billion possible actions.

· Commonsense Reasoning: Due to the lack of graphics, IF games rely on the player’s commonsense knowledge as a prior for how to interact with the game world. For example, a human player encountering a locked chest intuitively understands that the chest needs to be unlocked with some type of key, and once unlocked, the chest can be opened and will probably contain useful items.

· Knowledge Representation: IF games span many distinct locations, each with unique descriptions, objects, and characters. Players move between locations by issuing navigational commands like go west. because connectivity between locations is not necessarily Euclidean, agents need to detect when a navigational action has succeeded or failed and whether the location reached was previously seen or new. Beyond location connectivity, it’s also helpful to keep track of the objects present at each location, with the understanding that objects can be nested inside of other objects, such as food in a refrigerator or a sword in a chest.

These challenges need to be addressed in order to make IF games a viable mechanism for training reinforcement learning agents.

Enter Microsoft Jericho

Jericho is a Python-based learning environment based on IF games. You can think about Jericho as the OpenAI Gym of language learning. Jericho is optimized for reinforcement learning models and provides capabilities such as game state persistence that could help enable capabilities such as memory in reinforcement learning agents. In order to make IF games more accessible and address some of the challenges mentioned in the previous section, Jericho includes the following features:

· World-Object-Tree Representation: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.

· Fixed Random Seed to Enforce Determinism : By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like Go-Explore, which systematically build and expand a library of the visited states.

· Load/save Functionality: This feature enables restoration of previous game states, enabling the use of planning algorithms like Monte-Carlo tree search.

· World-Change Detection and Valid-Action Identification: This feature provides feedback on the success or failure of an agent’s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify valid actions, those that lead to changes in the game state.

The current version of Jericho includes two learning agents, Template-DQN (TDQN) and deep reinforcement relevance network (DRRN). TDQN models are typically more effective in parser-based games that handles the combinatorial action space by generating verb-object actions from a pre-defined set verbs and objects. DRRN models are better applied on choice-based games that present a series of choices at every state of the game. Jericho provides a common encoder to have a consistent input representation for both models. While both agents utilize common input representation, they differ in the methods of action selection. DRRN uses Jericho’s valid-action identification to estimate a Q-value for each of the valid actions a . It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.

In the previous diagram, we can see factors such as Onar, Oinv and Odesc as the elements to encode the textual input. After each command, Jericho agents use a common input representation that includes current textual observation Onar, inventory text Oinv, and current location description Odesc (as given by a look command). If we use the command “open window” in the popular text-based game Zork I, Jericho will generate the following representation.

· Onar: With great effort, you open the window far enough to allow entry.

· Oinv: You are empty-handed.

· Odesc: You are behind the white house. A path leads into the forest to the east. In one corner of the house there is a small window which is slightly ajar.

Microsoft evaluated the TDQN and DRRN across a diverse set of 32 games, including Zork I. The results showed that the Jericho agents accumulated higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills. However, none the agents even came close to human-level performance.

Jericho is an important step towards making language a central part of reinforcement learning models. Some of the initial experiments showed that humans cognitive skills like common sense and deduction remain a high barrier for AI agents. However, Jericho showed that language-based training could be the key to unlock those capabilities.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

成为乔布斯

[美] 布伦特·施兰德、[美] 里克·特策利 / 陶亮 / 中信出版集团 / 2016-10 / 69.00元

本书描绘了一位多姿多彩的人物将与生俱来的激情与成熟的管理方式相结合，打造出史上最有价值、最受消费者追捧的公司，这本书将彻底改变我们看待乔布斯的方式。本书推翻了关于史蒂夫·乔布斯的传说和陈词滥调，比如他是天才和混蛋的结合体，暴躁易怒、自私自利，怠慢朋友与家人。本书揭示了这位苹果联合创始人和CEO的家庭生活与职业生涯，并回答了一个关键问题：为什么如此轻狂傲慢、以至于被赶出苹果的年轻人能成为史上......一起来看看《成为乔布斯》这本书的介绍吧!

码农工具

Markdown 在线编辑器

RGB HSV 转换

RGB HSV 互转工具