Docker + TensorFlow + Google Cloud Platform = Love

栏目: IT技术 · 发布时间: 4年前

内容简介:Docker changed my engineering life. I have learnt to love that whale!
Make your life easier by Dockerising your TensorFlow

Docker + TensorFlow + Google Cloud Platform = Love

Docker, TensorFlow and Google Cloud Platform logos. Heart by Bohdan Burmich from the Noun Project.

Docker changed my engineering life. I have learnt to love that whale!

When I first installed TensorFlow with GPU support on my Windows laptop years ago, I was horrified at how complex and fragile the process was. I had to repeat this horrific process when I started dual booting Ubuntu on my laptop. I had to relive my past traumas when I got a GPU for my desktop.

What if there was an OS-agnostic way of running TensorFlow that would get you up and running in a matter of minutes?

This is the focus of this post! We will be using a Google Cloud Platform (GCP) Compute Engine VM as our machine. But you could easily replace this VM with your very own laptop/desktop with a NVIDIA GPU .

Note: I will assume that you have a GCP account and that you have the GCP SDK installed so that you can run GCP commands from your terminal.

Topics we’ll be visiting

Here’s an overview of the topics that this article will cover:

GPU quotas
VM startup scripts
TensorFlow Dockerfile
Cloud Build
Container Registry

Let’s do this!

Do ya got some GPU quota?

When you first get started in GCP, you aren’t allocated a GPU to play with. If you try to make a VM with a GPU with insufficient quota, you’ll get an error telling you that your quota has been exceeded. So let’s fix this right now.

Go to:

IAM & Admin -> Quotas

In the Metrics drop down, firstly click None .

Search for GPUs (all regions) in the text box and click on the result that appeared:

Docker + TensorFlow + Google Cloud Platform = Love

Tick the box in the list below and then click on EDIT QUOTAS :

Docker + TensorFlow + Google Cloud Platform = Love

Complete the form that appeared to the right of your screen and make a request for at least one GPU:

Docker + TensorFlow + Google Cloud Platform = Love

Now we wait for our approval to come through. This should be quick - I was approved in less than 2 minutes!

Building the VM

Once we have increased our quota, we can get to building a VM with at least one GPU. To accomplish this, we could either go into Compute Engine in the UI, or we could learn how to use the GCP’s Cloud SDK . Let’s do the latter!

Say that we want to create a VM in the zone us-west-1b named deep-docker . Assuming we have installed the Cloud SDK, we can issue this command in our terminal:

gcloud compute instances create deep-docker \
	--zone=us-west1-b \
	--accelerator="type=nvidia-tesla-k80,count=1" \
	--image-family "ubuntu-1804-lts" \
	--image-project "ubuntu-os-cloud" \
	--boot-disk-device-name="persistent-disk" \
	--boot-disk-size=100GB \
	--boot-disk-type=pd-standard \
	--machine-type=n1-standard-4 \
	--maintenance-policy=TERMINATE \
	--metadata-from-file startup-script=./startup.sh

Don’t worry about the metadata-from-file startup-script=... argument for now. We will explore this in the next section.

Why have we chosen Ubuntu when we can create a VM with a container using gcloud compute instances create-with-container ? Good question! This command creates a VM with a Container-Optimized OS based on Chromium OS. It’s a lot more complex to install NVIDIA drivers on such a VM, so we make our lives easier by choosing Ubuntu instead. If you’re keen to stick with the Container-Optimised OS, then see this repo for a GPU driver installation solution.

Before we can issue this command, we need to have a startup script present in our current directory. Let’s find out what this startup script is all about!

The startup script

Here is the full startup script.

The startup script takes care of a bunch of tricky things:

  • It installs Docker and sets gcloud as the Docker credential helper. This will allow us to pull the Docker image that we’ll be building later from GCP’s Container Registry .
  • It installs NVIDIA drivers onto the VM.
  • It installs the NVIDIA Container Toolkit, which will allow our Docker container to access the GPUs on our VM.

Let’s finally issue our command and wait for our VM to finish building.

Docker + TensorFlow + Google Cloud Platform = Love

You can track the progress of the startup script by SSH-ing into your machine:

gcloud compute ssh whale@deep-docker --zone=us-west1-b

Once in your VM, issue this and watch your log stream:

tail -f /var/log/syslog

At some point, you should see something like this:

Apr 12 08:09:49 deep-docker startup-script: INFO Finished running startup scripts.

And this is where you can dance a little celebratory dance. The hardest part of this process is over!

Making the startup script run once

An issue with our startup script is that it is run each time our VM boots up. If we frequently reboot our VMs, this will get unnecessarily time-consuming.

One way to make sure that our script is run once only is to remove it from our VM’s metadata using the gcloud CLI:

gcloud compute instances remove-metadata deep-docker --keys=startup-script

Another way to accomplish this is to follow the suggestion from here. This is the approach that I have taken. In the startup script , you will see that most of it is enclosed in an if statement:

if test ! -f "$STARTUP_SUCCESS_FILE"; then
	...
	touch /home/$LOGIN_USER/.ran-startup-script
else
	echo "$STARTUP_SUCCESS_FILE exists. not running startup script!"
fi

We decide whether to run the body of our startup script based on whether a file named .ran-startup-script exists in a particular location. Upon the first boot, that file does not exist, so the body of the if statement is executed. If all goes well in our first boot of our VM, the .ran-startup-script should get created by the touch line, above. On the second boot onwards, all the time-consuming parts of our startup script won’t get executed. We can check /var/log/syslog to confirm that this is the case:

Apr 12 09:05:58 deep-docker startup-script: INFO startup-script: /home/whale/.ran-startup-script exists. not running startup script!
Apr 12 09:05:58 deep-docker startup-script: INFO startup-script: Return code 0.

The Dockerfile

Here is our Dockerfile. It’s super simple!

tensorflow/tensorflow:2.1.0-gpu-py3

We’ll now build this image.

Build the Docker image in the cloud

The TensorFlow image we’re using is about 2GB in size. Instead of building our Docker image locally and pushing it to Container Registry from our local machine, we’ll take advantage of the power of GCP and build it in the cloud!

The image that we will be building will be located at gcr.io/GCP_PROJECT_NAME/SOME_IMAGE_NAME . My project is called learning-deeply . I want to call the image tf-2.1.0-gpu . So I will issue this command in my terminal:

REMOTE_IMAGE_NAME=gcr.io/learning-deeply/tf-2.1.0-gpu \
	&& gcloud builds submit --tag $(REMOTE_IMAGE_NAME) --timeout=15m

I specify a longer timeout to overcome a timeout issue I was experiencing. Let’s issue our command and watch our build take place!

Docker + TensorFlow + Google Cloud Platform = Love

We can monitor the progress of our build in the GCP Console’s Cloud Build section:

Docker + TensorFlow + Google Cloud Platform = Love

Once done, let’s head over to Container Registry section and we should see our beautiful image there!

Docker + TensorFlow + Google Cloud Platform = Love

Fire up our container and check for GPUs

This is exciting! I see you rubbing your palms in anticipation. Let’s see if our hard work has paid off.

Firstly, let’s SSH into our VM (see the startup script section for how to do this).

Let’s pull our Docker image into our VM! Issue a command similar to this one, replacing the reference to the location of the image with whatever you provided when issuing gcloud builds submit earlier:

docker pull gcr.io/learning-deeply/tf-2.1.0-gpu:latest

As we have made taken care of Container Registry authentication in our startup script, this should pull your image from Container Registry .

Next, let’s start up our container. Note that we have a --gpus argument which exposes all of the GPUs on our VM to our container:

docker run -it -d --name tf --gpus all gcr.io/learning-deeply/tf-2.1.0-gpu

Issue docker ps and we should see our container running!

Let’s now execute an interactive Bash shell on our container:

docker exec -it tf bash

You should see something beautiful like this:

Docker + TensorFlow + Google Cloud Platform = Love

Now cross your fingers and run this to check if we can access our GPU:

python3 -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"

A bunch of text will be printed. But if you see something like this at the end, you know that you have succeeded, my friends:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Conclusion

Docker changed the way I work. Not only do I use it for my machine learning work, I also use it for my regular data analysis work and to build this site.

If your job title begins with “Data”, do yourself a favour and learn to use it. You might also learn to love the whale!

Until next time,

Justin


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

浪潮式发售

浪潮式发售

[美] 杰夫.沃克(Jeff Walker) / 李文远 / 广东人民出版社 / 2016-3-1 / 39.80元

10天时间,4种发售路径, 让你的产品一上架就被秒杀 投资失败的个体户,怎样让长期积压的库存,变成众人抢购的稀缺品,最终敲开财富之门? 只有一腔热血的大学毕业生,怎样将原本无人问津的网球课程,发售成价值45万美元的专业教程? 长期脱离社会的全职主妇,如何白手起家,创造出自己的第一款爆品,并挽救即将破碎的家庭? 改变上述人士命运的是同一件法宝——产品发售方程式。互......一起来看看 《浪潮式发售》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器