Neural Fictitious Self-Play in Practice

栏目: IT技术 · 发布时间: 4年前

内容简介:This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code byDisclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main ax

This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger . The full source code can be found on his Github repository .

If you are new to the topic it is better to start with these articles first:

Introduction to Fictitious Play

Fictitious Self Play

Neural Fictitious Self-Play

Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.

The implementation involves distributed computation which adds a level of complexity to the code. However in this article, we will focus on the algorithm per se, and we will bypass the distributed computation aspect.

For this purpose, we will do the parallel with the academic algorithm below.

Neural Fictitious Self-Play in Practice

NFSP Algorithm from Heinrich/Silver paper

Leduc Hold’em

First, let’s define Leduc Hold’em game.

Here is a definition taken from DeepStack-Leduc . It reads:

Leduc Hold’em is a toy poker game sometimes used in academic research (first introduced in Bayes’ Bluff: Opponent Modeling in Poker ). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack — in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player’s private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

Global View

The main class is workers\driver\Driver.py which has a method run() that sets everything in motion.

It sets the main loop and the execution of the algorithm at each iteration, as seen in the following image.

Neural Fictitious Self-Play in Practice

Driver.py

The Algorithm

The bulk of the action happens in the _HighLevelAlgo.py where it is easy to distinguish the different parts of the academic solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

高性能JavaScript

高性能JavaScript

【美】Nicholas C. Zakas(尼古拉斯.泽卡斯) / 丁琛 / 电子工业出版社 / 2015-8-1 / 65

如果你使用 JavaScript 构建交互丰富的 Web 应用,那么 JavaScript 代码可能是造成你的Web应用速度变慢的主要原因。《高性能JavaScript》揭示的技术和策略能帮助你在开发过程中消除性能瓶颈。你将会了解如何提升各方面的性能,包括代码的加载、运行、DOM 交互、页面生存周期等。雅虎的前端工程师 Nicholas C. Zakas 和其他五位 JavaScript 专家介绍......一起来看看 《高性能JavaScript》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换