Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

栏目: IT技术 · 发布时间: 4年前

内容简介:Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human br

The new framework provides an OpenAI-like environment for language-based games.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human brain such as commonsense reasoning, deduction or inference are regularly expressed via language. It is hard to imagine any form of advanced artificial intelligence(AI) that wouldn’t rely on language to express its interactions with a given environment. In recent years, reinforcement learning have shown some promise helping AI agents learn their own languages for communication. To facilitate rapid development of language-based agents, Microsoft Research has open sourced Jericho , an learning environment that leverages language games to train reinforcement learning agents.

The idea of using language games to develop knowledge makes intuitive sense. Just like babies learn to develop language by regularly interactive with objects, we can build AI environment with the right incentives to develop communications in reinforcement learning agents. From the different game theory models in the language learning space, Jericho relies on a recent discipline known as interactive fiction(IF) games.

IF Games

In computer science, IF games are defined as are software environments in which players observe textual descriptions of the simulated world, issue text actions, and receive score as they progress through the story. From that perspective, IF games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. IF games like Zork I have achieved incredible levels of popularity. To illustrate how IF games can be applied to train AI agents, consider the game illustrated in the following figure: a reinforcement learning agent could learn to interact with an environment issuing language commands and receiving textual description of the new state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

Like many natural language processing(NLP) tasks, IF games require natural language understanding, but unlike most NLP tasks, IF games are sequential decision making problems in which actions change the subsequent world states of the game and choices made early in a game may have long term effects on the eventual endings. Furthermore, IF games come with their own set of natural language learning challenges:

· Combinatorial Action Space: More reinforcement learning models have been designed to operate on either discrete or continuous spaces. However, IF games require the agent to operate in the combinatorial action space of natural language. For example, an agent generating a four-word sentence from a modest vocabulary of size 700, is effectively exploring a space of 7004 = 240 billion possible actions.

· Commonsense Reasoning: Due to the lack of graphics, IF games rely on the player’s commonsense knowledge as a prior for how to interact with the game world. For example, a human player encountering a locked chest intuitively understands that the chest needs to be unlocked with some type of key, and once unlocked, the chest can be opened and will probably contain useful items.

· Knowledge Representation: IF games span many distinct locations, each with unique descriptions, objects, and characters. Players move between locations by issuing navigational commands like go west. because connectivity between locations is not necessarily Euclidean, agents need to detect when a navigational action has succeeded or failed and whether the location reached was previously seen or new. Beyond location connectivity, it’s also helpful to keep track of the objects present at each location, with the understanding that objects can be nested inside of other objects, such as food in a refrigerator or a sword in a chest.

These challenges need to be addressed in order to make IF games a viable mechanism for training reinforcement learning agents.

Enter Microsoft Jericho

Jericho is a Python-based learning environment based on IF games. You can think about Jericho as the OpenAI Gym of language learning. Jericho is optimized for reinforcement learning models and provides capabilities such as game state persistence that could help enable capabilities such as memory in reinforcement learning agents. In order to make IF games more accessible and address some of the challenges mentioned in the previous section, Jericho includes the following features:

· World-Object-Tree Representation: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.

· Fixed Random Seed to Enforce Determinism : By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like Go-Explore, which systematically build and expand a library of the visited states.

· Load/save Functionality: This feature enables restoration of previous game states, enabling the use of planning algorithms like Monte-Carlo tree search.

· World-Change Detection and Valid-Action Identification: This feature provides feedback on the success or failure of an agent’s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify valid actions, those that lead to changes in the game state.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

The current version of Jericho includes two learning agents, Template-DQN (TDQN) and deep reinforcement relevance network (DRRN). TDQN models are typically more effective in parser-based games that handles the combinatorial action space by generating verb-object actions from a pre-defined set verbs and objects. DRRN models are better applied on choice-based games that present a series of choices at every state of the game. Jericho provides a common encoder to have a consistent input representation for both models. While both agents utilize common input representation, they differ in the methods of action selection. DRRN uses Jericho’s valid-action identification to estimate a Q-value for each of the valid actions a . It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.

Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

In the previous diagram, we can see factors such as Onar, Oinv and Odesc as the elements to encode the textual input. After each command, Jericho agents use a common input representation that includes current textual observation Onar, inventory text Oinv, and current location description Odesc (as given by a look command). If we use the command “open window” in the popular text-based game Zork I, Jericho will generate the following representation.

· Onar: With great effort, you open the window far enough to allow entry.

· Oinv: You are empty-handed.

· Odesc: You are behind the white house. A path leads into the forest to the east. In one corner of the house there is a small window which is slightly ajar.

Microsoft evaluated the TDQN and DRRN across a diverse set of 32 games, including Zork I. The results showed that the Jericho agents accumulated higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills. However, none the agents even came close to human-level performance.

Jericho is an important step towards making language a central part of reinforcement learning models. Some of the initial experiments showed that humans cognitive skills like common sense and deduction remain a high barrier for AI agents. However, Jericho showed that language-based training could be the key to unlock those capabilities.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

未来简史

未来简史

[以色列] 尤瓦尔·赫拉利 / 林俊宏 / 中信出版集团 / 2017-2 / 68.00元

进入21世纪后,曾经长期威胁人类生存、发展的瘟疫、饥荒和战争已经被攻克,智人面临着新的待办议题:永生不老、幸福快乐和成为具有“神性”的人类。在解决这些新问题的过程中,科学技术的发展将颠覆我们很多当下认为无需佐证的“常识”,比如人文主义所推崇的自由意志将面临严峻挑战,机器将会代替人类做出更明智的选择。 更重要的,当以大数据、人工智能为代表的科学技术发展的日益成熟,人类将面临着从进化到智人以来z......一起来看看 《未来简史》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具