Neural Fictitious Self-Play in Practice

栏目: IT技术 · 发布时间: 5年前

内容简介:This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code byDisclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main ax

This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger . The full source code can be found on his Github repository .

If you are new to the topic it is better to start with these articles first:

Introduction to Fictitious Play

Fictitious Self Play

Neural Fictitious Self-Play

Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.

The implementation involves distributed computation which adds a level of complexity to the code. However in this article, we will focus on the algorithm per se, and we will bypass the distributed computation aspect.

For this purpose, we will do the parallel with the academic algorithm below.

Neural Fictitious Self-Play in Practice

NFSP Algorithm from Heinrich/Silver paper

Leduc Hold’em

First, let’s define Leduc Hold’em game.

Here is a definition taken from DeepStack-Leduc . It reads:

Leduc Hold’em is a toy poker game sometimes used in academic research (first introduced in Bayes’ Bluff: Opponent Modeling in Poker ). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack — in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player’s private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

Global View

The main class is workers\driver\Driver.py which has a method run() that sets everything in motion.

It sets the main loop and the execution of the algorithm at each iteration, as seen in the following image.

Neural Fictitious Self-Play in Practice

Driver.py

The Algorithm

The bulk of the action happens in the _HighLevelAlgo.py where it is easy to distinguish the different parts of the academic solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

人工智能

人工智能

S. Russell、P. Norvig / 清华大学出版社 / 2006-5-1 / 128.00元

本书被全世界89个国家的900多所大学用作教材。 本书以详尽和丰富的资料,从理性智能体的角度,全面阐述了人工智能领域的核心内容,并深入介绍了各个主要的研究方向。全书分为8大部分:第一部分“人工智能”,第二部分“问题求解”,第三部分“知识与推理”,第四部分“规划”,第五部分“不确定知识与推理”,第六部分“学习”,第七部分“通信、感知与行动”,第八部分“结论”。本书既详细介绍了人工智能的基本概念......一起来看看 《人工智能》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码

SHA 加密
SHA 加密

SHA 加密工具