Multi-Agent RL: Nash Equilibria and Friend or Foe Q-Learning

栏目: IT技术 · 发布时间: 4年前

内容简介:For whatever reason, humans innately possess the ability to collaborate. It’s become so commonplace that its nuances slip right under our noses. How do we justHere’s an interpretation: we reach a balance. AnMulti-agent learning environments are typically r

Making robots tip the scales

Multi-Agent RL: Nash Equilibria and Friend or Foe Q-Learning

Photo by Toa Heftiba on Unsplash

For whatever reason, humans innately possess the ability to collaborate. It’s become so commonplace that its nuances slip right under our noses. How do we just know how to coordinate when moving a heavy couch? How do we reason splitting up in a grocery store to minimize time? How are we able to observe others’ actions and understand how to best respond ?

Here’s an interpretation: we reach a balance. An equilibrium. Each person takes actions that not only best complements the others’ but altogether achieves the task at hand most efficiently. This application of equilibria comes up pretty often in game theory and extends to multi-agent RL (MARL). In this article, we explore two algorithms, Nash Q-Learning and Friend or Foe Q-Learning, both of which attempt to find multi-agent policies fulfilling this idea of “balance.” We assume basic knowledge of single-agent formulations and Q-learning.

Multi-Agent RL: Nash Equilibria and Friend or Foe Q-Learning

Photo by Erik Mclean on Unsplash

What Makes an Optimal Policy…Optimal?

Multi-agent learning environments are typically represented by Stochastic Games. Each agent aims to find a policy that maximizes their own expected discounted reward. Together, the overall goal is to find a joint policy that gathers the most reward for each agent . This joint reward is defined below in the form of a value function:

This goal applies to both competitive and collaborative situations. Agents can find policies that best counter or complement others. We call this optimal policy the Nash Equilibrium. More formally, it is a policy such that has this property:

At first, it seems like we’re beating a dead horse. The best policy gathers the most reward, so what?

Underneath all the fancy greek letters and notation, the Nash Equilibrium tells us a bit more. It says that each agent’s policy in Nash Equilibrium is the best response to the other agents’ optimal policies. No agent is incentivized to change their policy because any tweak gives less reward. In other words, all of the agents are at a standstill. Landlocked. Kind of trapped in a sense.

Multi-Agent RL: Nash Equilibria and Friend or Foe Q-Learning

Photo by NeONBRAND on Unsplash

To give an example, imagine a competitive game between two small robots: C3PO and Wall-E. During each round, they each choose a number one through ten, and whoever selects the higher number wins. As expected, both pick the number ten every time as neither robot wants to risk losing. If C3PO were to choose any other number, he would risk losing against Wall-E’s optimal policy of always choosing ten and vice versa. In other words, the two are at an equilibrium.


以上所述就是小编给大家介绍的《Multi-Agent RL: Nash Equilibria and Friend or Foe Q-Learning》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

图论导引

图论导引

韦斯特 / 机械工业出版社 / 2006-2 / 65.00元

图论起源于著名的哥尼斯堡七桥问题,在计算科学、社会科学和自然科学等各个领域都有广泛应用。本书是本科生或研究生一学期或两学期的图论课程教材。内容全面,证明与应用实例并举,不仅包括对证明技巧的讨论、1200多道习题、400多幅插图以及许多例题,而且对所有定理都给出了详细完整的证明。可以作为高等院校数学系本科生和研究生、计算机专业和其他专业研究生的图论课程教材,也可以作为有关教师和工程技术人员的参考书。......一起来看看 《图论导引》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试