Choosing the right GPU for deep learning on AWS

栏目: IT技术 · 发布时间: 4年前

内容简介：WithEI comes in different sizes, you can add just the amount of GPU processing power you need for your model. Since the GPU acceleration is accessible via the network, EI adds some latency compared to a native GPU in say a G4 instance, but will still be mu

Amazon Elastic Inference

With Amazon Elastic Inference (EI) , rather than selecting a GPU instance for hosting your models, you can attach a low-cost GPU powered acceleration to a CPU-only instance via the network.

EI comes in different sizes, you can add just the amount of GPU processing power you need for your model. Since the GPU acceleration is accessible via the network, EI adds some latency compared to a native GPU in say a G4 instance, but will still be much faster than a CPU-only instance if you have a demanding model.

It’s therefore important to define your target latency SLA for your application and work backwards to choose the right accelerator option. Let’s consider a hypothetical scenario below.

Please note all these numbers in the hypothetical scenario are made-up for the purpose of illustration.
Every use case is different. Use the scenario only for general guidance.

Let’s say your application can deliver a good customer experience if your total latency (app + network + model predictions) is under 200 ms. And let’s say, with a G4 instance type you can get total latency down to 40 ms which is well within your target latency. Also, let’s say with a C5 instance type you can only get total latency to 400 ms which does not meet your SLA requirements and results in poor customer experience.

With Elastic Inference, you can network attach a “slice” or a “fraction” of a GPU to a CPU instances such as a C5 instance, and get your total latency down to, say 180 ms which is under the desired 200 ms mark. Since EI is significantly cheaper than provisioning a dedicated GPU instance, you save on your total deployment costs. A GPU instance like G4 will still deliver best inference performance, but if the extra performance doesn’t improve your customer experience, you can use EI to stay under the target latency SLA, deliver good customer experience and save on overall deployment costs.

AWS Inferentia and Amazon EC2 Inf1 Instances

Amazon EC2 Inf1 is the new kid in the block. Inf1 instances give you access to a high performance inference chip called AWS Inferentia, custom designed by AWS. AWS Inferentia chips support FP16, BF16 and INT8 for reduced precision inference. To target AWS Inferentia, you can use the AWS Neuron software development kit (SDK) to compile your TensorFlow, PyTorch or MXNet model. The SDK also comes with a runtime library to run the compiled models in production.

Amazon EC2 Inf1 instances deliver better performance/cost compared to GPU EC2 instances. You’ll just need to make sure that the AWS Neuron SDK supports all layers in your model. See here for a list of supported Ops for each framework .

Optimizing for cost

You have a few different options to optimize the cost of your training and inference workloads.

Spot instances:Spot-instance pricing makes high-performance GPUs much more affordable and allows you to access spare Amazon EC2 compute capacity at a steep discount compared to on-demand rates. For an up-to-date list of prices by instance and Region, visit the Spot Instance Advisor . In some cases you can save over 90% on your training costs, but your instances can be preempted and be terminated with just 2 mins notice. Your training scripts must implement frequent checkpointing and ability to resume training once Spot capacity is restored.

Amazon SageMaker managed training:During the development phase much of your time is spent prototyping, tweaking code and trying different options in your favorite editor or IDE (which is obvious VIM) — all of which don’t need a GPU. You can save costs by simply decoupling your development and training resources and Amazon SageMaker will let you do this easily. Using the Amazon SageMaker Python SDK you can test your scripts locally on your laptop, desktop, EC2 instance or SageMaker notebook instance.

When you’re ready to train, specify what GPU instance type you want to train on and SageMaker will provision the instances, copy the dataset to the instance, train your model, copy results back to Amazon S3, and tear down the instance. You are only billed for the exact duration of training. Amazon SageMaker also supports managed Spot Training for additional convenience and cost savings.

I’ve written a guide on how to use it here: A quick guide to using Spot instances with Amazon SageMaker

Amazon Elastic Inference and Amazon Inf1 instance:Save costs for inference workloads by leveraging EI to add just the right amount of GPU acceleration to your CPU instances, or by leveraging cost effective Amazon Inf1 instances.

Optimize for cost by improving utilization:

Optimize your training code to take full advantage of P3 and G4 instances Tensor Cores by enabling mixed-precision training . Every deep learning framework does this differently and you’ll have to refer to the specific framework’s documentation.
Use reduce precision (INT8) inference on G4 instance types to improve performance. NVIDIA’s TensorRT library provides APIs to convert single precision models to INT8, and provides examples in their documentation.

What software should I use on Amazon EC2 GPU instances?

Without optimized software, there is a risk that you’ll under-utilize the hardware resources you provision. You may be tempted to “pip install tensorflow/pytorch”, but I highly recommend using AWS Deep Learning AMIs or AWS Deep Learning Containers (DLC) instead.

AWS qualifies and tests them on all Amazon EC2 GPU instances, and they include AWS optimizations for networking, storage access and the latest NVIDIA and Intel drivers and libraries. Deep learning frameworks have upstream and downstream dependencies on higher level schedulers and orchestrators and lower-level infrastructure services. By using AWS AMIs and AWS DLCs you know it’s been tested end-to-end and is guaranteed to give you the best performance.

TL;DR

Congratulations! You made it to the end (even if you didn’t read all of the post). I intended this to be a quick guide, but also provide enough context, references and links to learn more. In this final section, I’m going to give you the quick recommendation list. Please take time to read about the specific instance type and GPU type you’re considering to make an informed decision.

The recommendation below is very much my personal opinion based on my experience working with GPUs and deep learning. Caveat emptor.

And (drum roll) here’s the list:

Highest performing GPU instance on AWS. Period :
Best single GPU training performance : p3.2xlarge (V100, 16 GB GPU)
Best single-GPU instance for developing, testing and prototyping : g4dn.xlarge (T4, 16 GB GPU). Consider g4dn.(2/4/8/16)xlarge for more vCPUs and higher system memory.
Best multi-GPU instance for single node training and running parallel experiments : p3.8xlarge (4 V100 GPUs, 16 GB per GPU), p3.16xlarge (8 GPUs, 16 GB per GPU)
Best multi-GPU, multi-node distributed training performance : p3dn.24xlarge (8 V100 GPUs, 32 GB per GPU, 100 Gbps aggregate network bandwidth)
Best single-GPU instance for inference deployments : G4 instance type. Choose instance size g4dn.(2/4/8/16)xlarge based on pre- and post-processing steps in your deployed application.
I need the most GPU memory I can get for large models : p3dn.24xlarge (8 V100, 32 GB per GPU)
I need access to Tensor Cores for mixed-precision training : P3 and G4 instance types. Choose the instance size based on your model size and application.
I need access to double precision (FP64) for HPC and deep learning : P3, P2 instance types. Choose the instance size based on your application.
I need 8 bit integer precision (INT8) for inference : G4 instance type. Choose instance size based on pre- and post-processing steps in your deployed application.
I need access to half precision (FP16) for inference : P3, G4 instance type. Choose the instance size based on your application.
I want GPU acceleration for inference but don’t need a full GPU : Use Amazon Elastic Inference and attach just the right amount of GPU acceleration you need.
I want the best performance on any GPU instance : Use AWS Deep Learning AMI and AWS Deep Learning Containers
I want to save money : Use Spot Instances and Managed Spot Training on Amazon SageMaker. Choose Amazon Elastic Inference for models that don’t take advantage of a full GPU.

Thank you for reading. If you found this article interesting, please check out my other blog posts on medium or follow me on twitter ( @shshnkp ), LinkedIn or leave a comment below. Want me to write on a specific machine learning topic? I’d love to hear from you!

以上所述就是小编给大家介绍的《Choosing the right GPU for deep learning on AWS》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Choosing the right GPU for deep learning on AWS

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

不是为了快乐

宗萨蒋扬钦哲仁波切 / 姚仁喜 / 深圳报业集团出版社 / 2013-1 / 38.00元

前行修持是一套完整的实修系统，它既是一切佛法修持的根基，又囊括了所有修持的精华，以及心灵之道上所需的一切；既适合入门者打造学佛基本功，也是修行人需要终生修持的心法。书中除了实际的方法指导之外，还不断启发佛法的珍贵与修持的必要，并处处可见对学佛者的鼓舞和纠正，其最终的用心，是让我们踏上不间断的修持之路，真正转化我们僵硬、散乱和困惑的心。在现代人看来，快乐，理应是最值得追求的目标。我们希望生活......一起来看看《不是为了快乐》这本书的介绍吧!

码农工具