Deploy Machine Learning Pipeline on Google Kubernetes Engine

栏目: IT技术 · 发布时间: 4年前

内容简介:In ourIn this tutorial, we will use the same machine learning pipeline and Flask app that we built and deployed previously. This time we will demonstrate how to containerize and deploy machine learning pipeline on Google Kubernetes Engine.Previously we dem

Deploy Machine Learning Pipeline on Google Kubernetes Engine

A step-by-step beginner’s guide to containerize and deploy ML pipeline on Google Kubernetes Engine

RECAP

In our last post for deploying machine learning pipeline on cloud we demonstrated how to develop a machine learning pipeline in PyCaret and containerize it with Docker and serve as a web app using Microsoft Azure Web App Services. If you haven’t heard about PyCaret before, please read this announcement to learn more.

In this tutorial, we will use the same machine learning pipeline and Flask app that we built and deployed previously. This time we will demonstrate how to containerize and deploy machine learning pipeline on Google Kubernetes Engine.

 Learning Goals of this Tutorial

  • What is a Container, What is Docker, What is Kubernetes, and What is Google Kubernetes Engine?
  • Build a Docker image and upload it on Google Container Registry (GCR).
  • Create clusters and deploy machine learning pipeline with Flask app as a web service.
  • See a web app in action that uses a trained machine learning pipeline to predict on new data points in real-time.

Previously we demonstrated how to deploy ML pipeline on Heroku PaaS and how to deploy ML pipeline on Azure Web Services with Docker container.

This tutorial will cover the entire workflow of building a docker image to uploading it onto Google Container Registry and then deploying the pre-trained machine learning pipeline and Flask app onto Google Kubernetes Engine (GKE).

 Toolbox for this tutorial

PyCaret

PyCaret is an open source, low-code machine learning library in Python that is used to train and deploy machine learning pipelines and models into production. PyCaret can be installed easily using pip.

pip install pycaret

Flask

Flask is a framework that allows you to build web applications. A web application can be a commercial website, blog, e-commerce system, or an application that generates predictions from data provided in real-time using trained models. If you don’t have Flask installed, you can use pip to install it.

Google Cloud Platform

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail and YouTube. If you do not have account with GCP, you can sign-up here . If you are signing up for the first time you will get free credits for 1 year.

Let’s get started.

Before we get into Kubernetes, let’s understand what is a container and why do we need it?

Have you ever had the problem where your code works fine on your computer but when your friend tries to run the exact same code, it doesn’t work? If your friend is repeating the exact same steps, they should get the same results right? The one-word answer to this is the environment . Your friend’s environment is different than yours.

What does an environment include? → Programing language such as Python and all the libraries and dependencies with the exact versions using which application was built and tested.

If we can create an environment that we can transfer to other machines (for example: your friend’s computer or a cloud service provider like Google Cloud Platform), we can reproduce the results anywhere. Hence, a container is a type of software that packages up an application and all its dependencies so the application runs reliably from one computing environment to another.

What’s Docker then?

Docker is a company that provides software (also called Docker) that allows users to build, run and manage containers. While Docker’s container are the most common, there are other less famous alternatives such as LXD and LXC that provides container solution.

Now that you understand containers and dockers in specific, let’s understand what is Kubernetes all about.

What is Kubernetes?

Kubernetes is a powerful open-source system developed by Google back in 2014, for managing containerized applications. In simple words, Kubernetes is a system for running and coordinating containerized applications across a cluster of machines. It is a platform designed to completely manage the life cycle of containerized applications.

Kubernetes is an open-source container management system

Features

✔️ Load Balancing: Automatically distributes the load between containers.

✔️ Scaling: Automatically scale up or down by adding or removing containers when demand changes such as peak hours, weekends and holidays.

✔️ Storage: Keeps storage consistent with multiple instances of an application.

✔️ Self-healing Automatically restarts containers that fail and kills containers that don’t respond to your user-defined health check.

✔️ Automated Rollouts you can automate Kubernetes to create new containers for your deployment, remove existing containers and adopt all their resources to the new container.

Why do you need Kubernetes if you have Docker?

Imagine a scenario where you have to run multiple docker containers on multiple machines to support an enterprise level ML application with varied workloads during day and night. As simple as it may sound, it is a lot of work to do manually.

You need to start the right containers at the right time, figure out how they can talk to each other, handle storage considerations, and deal with failed containers or hardware. This is the problem Kubernetes is solving by allowing large numbers of containers to work together in harmony, reducing operational burden.

It’s a mistake to compare Docker with Kubernetes. These are two different technologies. Docker is a software that allows you to containerize applications while Kubernetes is a container management system that allows to create, scale and monitor hundreds and thousands of containers.

In the lifecycle of any application, Docker is used for packaging the application at the time of deployment, while kubernetes is used for rest of the life for managing the application.

Lifecycle of an application deployed through Kubernetes / Docker

What is Google Kubernetes Engine?

Google Kubernetes Engine is implementation of Google’s open source Kubernetes on Google Cloud Platform. Simple!

Other popular alternatives to GKE are Amazon ECS and Microsoft Azure Kubernetes Service .

One final time, do you understand this?

  • Container is a type of software that packages up an application and all its dependencies so the application runs reliably from one computing environment to another.
  • Docker is a software used for building and managing containers.
  • Kubernetes is an open-source system for managing containerized applications in a clustered environment.
  • Google Kubernetes Engine is an implementation of open source Kubernetes framework on Google Cloud Platform.

In this tutorial we will use Google Kubernetes Engine. In order to follow along, you must have Google Cloud Platform’s account. Click here to sign-up for free.

Setting the Business Context

An insurance company wants to improve its cash flow forecasting by better predicting patient charges using demographic and basic patient health risk metrics at the time of hospitalization.

( data source )

Objective

To build and deploy a web application where the demographic and health information of a patient is entered into a web-based form which then outputs a predicted charge amount.

Tasks

  • Train and develop a machine learning pipeline for deployment.
  • Build a web app using Flask framework. It will use the trained ML pipeline to generate predictions on new data points in real-time.
  • Build a docker image and upload container onto Google Container Registry (GCR).
  • Create clusters and deploy app on Google Kubernetes Engine.

Since we have already covered the first two tasks in our first tutorial, we will quickly recap them and focus on the remaining tasks in the list above. If you are interested in learning more about developing machine learning pipeline in Python using PyCaret and building a web app using Flask framework, please read this tutorial .

 Develop Machine Learning Pipeline

We are using PyCaret in Python for training and developing a machine learning pipeline which will be used as part of our web app. The Machine Learning Pipeline can be developed in an Integrated Development Environment (IDE) or Notebook. We have used a notebook to run the below code:

When you save a model in PyCaret, the entire transformation pipeline based on the configuration defined in the setup() function is created . All inter-dependencies are orchestrated automatically. See the pipeline and model stored in the ‘deployment_28042020’ variable:

Machine Learning Pipeline created using PyCaret

 Build Web Application

This tutorial is not focused on building a Flask application. It is only discussed here for completeness. Now that our machine learning pipeline is ready we need a web application that can connect to our trained pipeline to generate predictions on new data points in real-time. We have created the web application using Flask framework in Python. There are two parts of this application:

  • Front-end (designed using HTML)
  • Back-end (developed using Flask)

This is how our web application looks:

Web application on local machine

If you haven’t followed along so far, no problem. You can simply fork this repository from GitHub. This is how your project folder should look at this point:

Now that we have a fully functional web application, we can start the process of containerizing and deploying the app on Google Kubernetes Engine.

10-steps to deploy a ML pipeline on Google Kubernetes Engine:

 Step 1 — Create a new project in GCP Console

Sign-in to your GCP console and go to Manage Resources

Google Cloud Platform Console → Manage Resources

Click on Create New Project

Google Cloud Platform Console → Manage Resources → Create New Project

 Step 2 — Import Project Code

Click the Activate Cloud Shell button at the top of console window to open Cloud Shell.

Google Cloud Platform (Project Info Page)

Execute the following code in Cloud Shell to clone the GitHub repository of this tutorial.

git clone https://github.com/pycaret/pycaret-deployment-google.git

 Step 3— Set Project ID Environment Variable

Execute the following code to set the PROJECT_ID environment variable.

export PROJECT_ID=pycaret-kubernetes-demo

pycaret-kubernetes-demo is the name of the project we choose in step 1 above.

 Step 4— Build the docker image

Build the docker image of the application and tag it for uploading by executing the following code:

docker build -t gcr.io/${PROJECT_ID}/insurance-app:v1 .
Message returned when docker build is successful

You can check the available images by running the following code:

docker images
Output of “docker images” command on Cloud Shell

 Step 5— Upload the container image

  1. Authenticate to Container Registry (you need to run this only once):
gcloud auth configure-docker

2. Execute the following code to upload the docker image to Google Container Registry:

docker push gcr.io/${PROJECT_ID}/insurance-app:v1

 Step 6— Create Cluster

Now that container is uploaded, you need a cluster to run the container. A cluster consists of a pool of Compute Engine VM instances, running Kubernetes.

  1. Set your project ID and Compute Engine zone options for the gcloud tool:
gcloud config set project $PROJECT_ID 
gcloud config set compute/zone us-central1

2. Create a cluster by executing the following code:

gcloud container clusters create insurance-cluster --num-nodes=2
Google Cloud Platform → Kubernetes Engine → Clusters

 Step 7— Deploy Application

To deploy and manage applications on a GKE cluster, you must communicate with the Kubernetes cluster management system. Execute the following command to deploy application:

kubectl create deployment insurance-app --image=gcr.io/${PROJECT_ID}/insurance-app:v1
Output returned on creating deployment through kubectl

 Step 8— Expose your application to the internet

By default, the containers you run on GKE are not accessible from the internet because they do not have external IP addresses. Execute the following code the expose application to internet:

kubectl expose deployment insurance-app --type=LoadBalancer --port 80 --target-port 8080

 Step 9— Check Service

Execute the following code to get the status of service. EXTERNAL-IP is the web address you can use in browser to view the published app.

kubectl get service
Cloud Shell → kubectl get service

 Step 10— See the app in action on http://34.71.77.61:8080

Final app uploaded on http://34.71.77.61:8080

Note:By the time this story is published, the app will be removed from public address to restrict resource consumption.

Link to GitHub Repository for this tutorial

Link to GitHub Repository for Microsoft Azure Deployment

Link to GitHub Repository for Heroku Deployment


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

思考,快与慢

思考,快与慢

[美] 丹尼尔·卡尼曼 / 胡晓姣、李爱民、何梦莹 / 中信出版社 / 2012-7 / 69.00元

《纽约时报》2011年度十大好书 新书上市,连续20多周蝉联亚马逊、《纽约时报》畅销书排行榜前20名,上市至今超过7个月,横扫全球各大畅销书排行榜,稳居亚马逊总榜前50名 《经济学人》、《华尔街日报》、《卫报》、《纽约时报》、《金融时报》、《商业周刊》、《华盛顿邮报》、等国外权威媒体,《三联生活周刊》、《商学院》、《东方早报》等国内知名媒体争相报道,国内外读者好评如潮 人类究竟有......一起来看看 《思考,快与慢》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

SHA 加密
SHA 加密

SHA 加密工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具