Neural Fictitious Self-Play in Practice

栏目: IT技术 · 发布时间: 6年前

内容简介:This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code byDisclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main ax

This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger . The full source code can be found on his Github repository .

If you are new to the topic it is better to start with these articles first:

Introduction to Fictitious Play

Fictitious Self Play

Neural Fictitious Self-Play

Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.

The implementation involves distributed computation which adds a level of complexity to the code. However in this article, we will focus on the algorithm per se, and we will bypass the distributed computation aspect.

For this purpose, we will do the parallel with the academic algorithm below.

Neural Fictitious Self-Play in Practice

NFSP Algorithm from Heinrich/Silver paper

Leduc Hold’em

First, let’s define Leduc Hold’em game.

Here is a definition taken from DeepStack-Leduc . It reads:

Leduc Hold’em is a toy poker game sometimes used in academic research (first introduced in Bayes’ Bluff: Opponent Modeling in Poker ). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack — in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player’s private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

Global View

The main class is workers\driver\Driver.py which has a method run() that sets everything in motion.

It sets the main loop and the execution of the algorithm at each iteration, as seen in the following image.

Neural Fictitious Self-Play in Practice

Driver.py

The Algorithm

The bulk of the action happens in the _HighLevelAlgo.py where it is easy to distinguish the different parts of the academic solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

程序是怎样跑起来的

程序是怎样跑起来的

[日] 矢泽久雄 / 李逢俊 / 人民邮电出版社 / 2015-4 / 39.00元

本书从计算机的内部结构开始讲起,以图配文的形式详细讲解了二进制、内存、数据压缩、源文件和可执行文件、操作系统和应用程序的关系、汇编语言、硬件控制方法等内容,目的是让读者了解从用户双击程序图标到程序开始运行之间到底发生了什么。同时专设了“如果是你,你会怎样介绍?”专栏,以小学生、老奶奶为对象讲解程序的运行原理,颇为有趣。本书图文并茂,通俗易懂,非常适合计算机爱好者及相关从业人员阅读。一起来看看 《程序是怎样跑起来的》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具