Using GitHub Actions for MLOps & Data Science

栏目: IT技术 · 发布时间: 5年前

内容简介:To mitigate these concerns, we have created a series of GitHub Actions that integrate parts of the data science and machine learning workflow with a software development workflow. Furthermore, we provide components and examples that automate common tasks.C

Machine Learning Operations (or MLOps) enables Data Scientists to work in a more collaborative fashion, by providing testing, lineage, versioning, and historical information in an automated way.  Because the landscape of MLOps is nascent, data scientists are often forced to implement these tools from scratch. The closely related discipline of DevOps offers some help, however many DevOps tools are generic and require the implementation of “ML awareness” through custom code. Furthermore, these platforms often require disparate tools that are decoupled from your code leading to poor debugging and reproducibility.

To mitigate these concerns, we have created a series of GitHub Actions that integrate parts of the data science and machine learning workflow with a software development workflow. Furthermore, we provide components and examples that automate common tasks.

An Example Of MLOps Using GitHub Actions

Consider the below example of how an experiment tracking system can be integrated with GitHub Actions to enable MLOps. In the below example, we demonstrate how you can orchestrate a machine learning pipeline to run on the infrastructure of your choice, collect metrics using an experiment tracking system, and report the results back to a pull request.

Using GitHub Actions for MLOps & Data Science

A screenshot of this pull request .

For a live demonstration of the above example, please see this talk .

MLOps is not limited to the above example. Due to the composability of GitHub Actions, you can stack workflows in many ways that can help data scientists. Below is a concrete example of a very simple workflow that adds links to mybinder.org on pull requests:

name: Binder
on: 
  pull_request:
    types: [opened, reopened]

jobs:
  Create-Binder-Badge:
    runs-on: ubuntu-latest
    steps:

    - name: checkout pull request branch
      uses: actions/<a href="/cdn-cgi/l/email-protection" data-cfemail="b7d4dfd2d4dcd8c2c3f7c185">[email protected]</a>
      with:
        ref: ${{ github.event.pull_request.head.sha }}

    - name: comment on PR with Binder link
      uses: actions/<a href="/cdn-cgi/l/email-protection" data-cfemail="e6818f928e9384cb9585948f9692a690d7">[email protected]</a>
      with:
        github-token: ${{secrets.GITHUB_TOKEN}}
        script: |
          var BRANCH_NAME = process.env.BRANCH_NAME;
          github.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: `[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/${context.repo.owner}/${context.repo.repo}/${BRANCH_NAME}) :point_left: Launch a binder notebook on this branch`
          }) 
      env:
        BRANCH_NAME: ${{ github.event.pull_request.head.ref }}

When the above YAML file is added to a repository’s .github/workflow directory, pull requests can be annotated with a useful link as illustrated below [1]:

Using GitHub Actions for MLOps & Data Science

A Growing Ecosystem of MLOps & Data Science Actions

There is a growing number of Actions available for machine learning ops and data science. Below are some concrete examples that are in use today, categorized by topic.

Orchestrating Machine Learning Pipelines :

  • Submit Argo Workflows – allows you to orchestrate machine learning pipelines that run on Kubernetes.
  • Publish Kubeflow Pipelines to GKE – Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.

Jupyter Notebooks :

  • Run parameterized Notebooks – run notebooks programmatically using papermill.
  • Repo2Docker Action – Automatically turn data-science repositories into Jupyter-enabled Docker containers using repo2docker.
  • fastai/fastpages – share information from Jupyter notebooks as blog posts using GitHub Actions & GitHub Pages.

End-To-End Workflow Orchestration :

Experiment Tracking

This is by no means an exhaustive list of the things you might want to automate with GitHub Actions with respect to data science and machine learning.   You can follow our progress towards this goal on our page , which contains links to blog posts , GitHub Actions , talks , and examples that are relevant to this topic.

We invite the community to create other Actions that might be useful for the community. Some ideas for getting started include data and model versioning, model deployment, data validation, as well as expanding upon some of the areas mentioned above. A great place to start is the documentation for GitHub Actions, particularly on how to build Actions for the community !


以上所述就是小编给大家介绍的《Using GitHub Actions for MLOps & Data Science》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

游戏编程算法与技巧

游戏编程算法与技巧

【美】Sanjay Madhav / 刘瀚阳 / 电子工业出版社 / 2016-10 / 89

《游戏编程算法与技巧》介绍了大量今天在游戏行业中用到的算法与技术。《游戏编程算法与技巧》是为广大熟悉面向对象编程以及基础数据结构的游戏开发者所设计的。作者采用了一种独立于平台框架的方法来展示开发,包括2D 和3D 图形学、物理、人工智能、摄像机等多个方面的技术。《游戏编程算法与技巧》中内容几乎兼容所有游戏,无论这些游戏采用何种风格、开发语言和框架。 《游戏编程算法与技巧》的每个概念都是用C#......一起来看看 《游戏编程算法与技巧》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试