An Introduction to Unity ML-Agents
Train a reinforcement learning agent to jump over walls.
Jan 30 ·10min read
This article is part of a new free course on Deep Reinforcement Learning with Unity. Where we’ll create agents with TensorFlow that learn to play video games using the Unity game engine . Check the syllabus here .
If you never study Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.
The past few years have witnessed breakthroughs in reinforcement learning (RL). From the first successful use of RL by a deep learning model for learning a policy from pixel input in 2013 to the OpenAI Dexterity program in 2019, we live in an exciting moment in RL research.
Consequently, we need, as RL researchers, to create more and more complex environments and Unity helps us to do that. Unity ML-Agents toolkit is a new plugin based on the game engine Unity that allows us to use the Unity Game Engine as an environment builder to train agents.
From playing football, learning to walk, to jump big walls, to train a cute doggy to catch sticks, Unity ML-Agents Toolkit provides a ton of amazing pre-made environment.
Furthermore, during this free course, we will also create new learning environments.
For now, we’ll learn how Unity ML-Agents works and at the end of the article, you’ll train an agent to learn jumping over walls.
But first, there are some requirements:
- This is not a reinforcement learning introductory course , if you don’t have skills in Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.
- Moreover, this is not a course about Unity, so you need to have some Unity basic skills. if it’s not the case you should definitely check out their amazing course for beginners: Create with Code.
So let’s get started!
How Unity ML-Agents works?
What’s Unity ML-Agents?
Unity ML-Agents is a new plugin for the game engine Unity that allows us to create or use pre-made environments to train our agents.
It’s developed by Unity Technologies, the developers of Unity , one of the best game engine ever. This is used by the creators of Firewatch, Gone Home, Cuphead and also a lot of AAA games.
The three components
With Unity ML-Agents, you have three important components.
The first is the Learning Component (on Unity), that contains the Unity scene and the environment elements.
The second is the Python API that contains the RL algorithms (such as PPO and SAC). We use this API to launch training, to test, etc. It communicates with the Learning environment through the external communicator.
Inside the Learning Component
Inside the Learning Component, we have different elements:
The first is the Agent, the actor of the scene. He’s him that we’re going to train by optimizing his policy (that will tell us what action to take at each state) called Brain.
Finally, there is the Academy, this element orchestrates agents and their decision-making process. Think of this Academy as a maestro that handles the requests from the python API.
To better understand its role let’s remember the RL process. This can be modeled as a loop that works like this:
Source: Sutton’s Book
Now, let’s imagine an agent learning to play a platform game. The RL process looks like this:
- Our agent receives state S0 from the environment — we receive the first frame of our game (environment).
- Based on the state S0, the agent takes an action A0 — our agent will move to the right.
- The environment transitions to a new state S1.
- Give a reward R1 to the agent — we’re not dead (Positive Reward +1) .
This RL loop outputs a sequence of state, action, and reward. The goal of the agent is to maximize the expected cumulative reward.
In fact, the Academy will be the one that will send the order to our Agents and ensure that agents are in sync :
- Collect Observations
- Select your action using your policy
- Take the Action
- Reset if you reached the max step or if you’re done.
Train an agent to jump walls
So now that we understand how Unity ML-Agents works, let’s train an agent to jump off the wall.
We published our trained models on github, you can download them here.
The Wall Jump Environment
The goal in this environment is to train our agent to go on the green tile.
However, there are 3 situations:
- The first, you have no walls, our agent just needs to go on the green tile.
- In the second situation, the agent needs to learn to jump in order to reach the green tile.
- Finally, in the hardest situation, our agent will not be able to jump as high as the wall is so he needs to push the white block in order to jump on it to be able to jump over the wall.
We’ll learn two different policiesdepending on the height of the wall:
- The first SmallWallJump will be learned during the no wall and low wall situations.
- The second, BigWallJump , will be learned during the high wall situations.
The reward system is:
In terms of observation, we don’t use normal vision (frame) but 14 raycasts that can each detect 4 possible objects. Think of raycasts as lasers that will detect if it passes through object.
We also use the global position of the agent and whether or not is grounded.
The action space is discrete with 4 branches:
Our goal is to hit the benchmark with a mean reward of 0.8.
Let’s jump!
First of all, let’s open the UnitySDK project.
In the examples search for WallJump and open the scene.
You see in the scene, a lot of Agents, each of them comes from the same Prefab and they all share the same Brain.
In fact, as we do in classical Deep Reinforcement Learning when we launched multiple instances of a game (for instance 128 parallel environments) we do the same hereby copy and paste the agents, in order to have more various states.
So, first, because we want to train our agent from scratch, we need to remove the brains from the agent. We need to go to the prefabs folder and open the Prefab.
Now in the Prefab hierarchy, select the Agent and go into the inspector.
First, in Behavior Parameters, we need to remove the Model. If you have some GPUs you can change Inference Device from CPU to GPU.
Then in Wall Jump Agent Component, we need to remove the brains for No Wall Brain, Small Wall Brain, and Big Wall Brain situations.
Now that you’ve done that you’re ready to train your agent from scratch.
For this first training, we’ll just modify the total training steps for the two policies (SmallWallJump and BigWallJump) because we can hit the benchmark in only 300k training steps.
To do that we go to and you modify these to max_steps to 3e5 for SmallWallJump and BigWallJump situations in config/trainer_config.yaml
To train this agent, we will use PPO (Proximal Policy Optimization) if you don’t know about it or you need to refresh your knowledge, check my article.
We saw that to train this agent, we need to call our External Communicator using the Python API. This External Communicator will then ask the Academy to start the agents.
So, you need to open your terminal, go where ml-agents-master is and type this:
mlagents-learn config/trainer_config.yaml — run-id=”WallJump_FirstTrain” — train
It will ask you to run the Unity scene,
Press the :arrow_forward: button at the top of the Editor.
You can monitor your training by launching Tensorboard using this command:
tensorboard — logdir=summaries
Watching your agent jumping over walls
You can watch your agent during the training by looking at the game window.
When the training is finished you need to move the saved model files contained in ml-agents-master/models to UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels .
And again, open the Unity Editor, and select WallJump scene .
Select the WallJumpArea prefab object and open it.
Select Agent .
In Agent Behavior Parameters , drag the SmallWallJump.nn file to Model Placeholder.
Drag the SmallWallJump.nn file to No Wall Brain Placeholder.
Drag the SmallWallJump.n n file to Small Wall Brain Placeholder.
Drag the BigWallJump.nn file to No Wall Brain Placeholder.
Then, press the :arrow_forward: button at the top of the Editor and voila!
Time for some experiments
We’ve just trained our agents to learn to jump over walls. Now that we have good results we can make some experiments.
Remember that the best way to learn is to be active by experimenting. So you should try to make some hypotheses and verify them.
Reducing the discount rate to 0.95
We know that:
- The larger the gamma, the smaller the discount. This means the learning agent cares more about the long term reward.
- On the other hand, the smaller the gamma, the bigger the discount. This means our agent cares more about the short term reward.
The idea behind this experimentation was if we increase the discount by decreasing the gamma from 0.99 to 0.95, our agent will care more about the short term reward, and maybe it will help him to converge faster to an optimal policy.
Something interesting to see is that our agent performs quite the same in case of Small Wall Jump, we can explain that with the fact that this situation is quite easy, he just needs to move towards the green grid and jump if there is a small wall.
On the other hand, it performed really badly in the case of Big Wall Jump. We can explain that because our new agent cares more on short term reward, he was unable to have a long term thinking and thus didn’t really understood that he needed to push the white brick in order to be able to jump over the wall.
Increasing the complexity of the Neural Network
For this third and last training, the hypothesis was does our agent will become smarter if we increase the Network complexity?
What we’ve done is increasing the size of the hidden unit from 256 to 512.
But we found that this new agent performs poorer than our first agent.
It implies that we don’t need to increase the complexity of our network when we have this type of simple problem because it increases the training time until convergence.
That’s all for today! You’ve just trained an agent that learns to jump over walls. Awesome!
Don’t forget to experiment, change some hyperparameters try news things. Have fun!
If you want to compare with our experimentations, we published our trained models here.
In the next article, we’ll train a smarter agent that needs to press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top. In order to do that we’ll use in addition to extrinsic rewards curiosity as an intrinsic one.
If you have any thoughts, comments, questions, feel free to comment below or send me an email: hello@simoninithomas.com, or tweet me @ThomasSimonini .
Keep learning, stay awesome!
以上所述就是小编给大家介绍的《An Introduction to Unity ML-Agents》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
编程算法新手自学手册
管西京 / 机械工业 / 2012-1 / 69.80元
《编程算法新手自学手册》主要内容简介:算法是指在有限步骤内求解某一问题所使用的一组定义明确的规则。程序员都会看重数据结构和算法的作用,水平越高,就越能理解算法的重要性。算法不仅是运算工具,更是程序的灵魂。《编程算法新手自学手册》循序渐进、由浅入深地详细讲解了基于C语言算法的核心技术,并通过具体实例的实现过程演练了各个知识点的具体使用流程。全书共11章,分为4篇。1~2章是基础篇,介绍算法开发所必需......一起来看看 《编程算法新手自学手册》 这本书的介绍吧!