Tensorflow Extended, ML Metadata and Apache Beam on the Cloud

栏目: IT技术 · 发布时间: 4年前

Tensorflow Extended, ML Metadata and Apache Beam on the Cloud

A practical and self-contained example using GCP Dataflow

The fully end to end example that tensorflow extended provides by running tfx template copy taxi $target-dir produces 17 files scattered in 5 directories. If you are looking for a smaller, simpler and self contained example that actually runs on the cloud and not locally, this is what you are looking for. Cloud services setup is also mentioned here.

What’s going to be covered

We are going to generate statistics and a schema for the Chicago taxi trips csv dataset that you can find by running the tfx template copy taxi command under the data directory.

Generated artifacts such as data statistics or the schema are going to be viewed from a jupyter notebook, by connecting to the ML Metadata store or just by downloading artifacts from simple file/binary storage.

Full code sample at the bottom of the article

Services Used

The whole pipeline can run on your local machine ( or on different cloud providers/your custom spark clusters as well). This is an example that can be scaled by using bigger datasets. If you wish to understand how this happens transparently, read this article .

Execution Process

If running locally, code will not be serialised or sent to the cloud (of course). Otherwise, Beam is going to send everything to a staging location (typically bucket storage). Check out cloudpickle to get some intuition on how serialisation is done.
Your cloud running service of choice (ours is Dataflow) is going to check if all the mentioned resources exist and are accessible (for example, pipeline output, temporary file storage, etc)
Compute instances are going to be started and your pipeline is going to be executed in a distributed scenario, showing up in the job inspector while it is still running or finished.

It’s a good naming practise to use /temp or /tmp for temporary files and /staging or /binaries for the staging directory.

The TFX Pipeline

Tensorflow Extended provides it’s custom component wrappers around plain old beam components. They are a bit more federated in the form: artifacts are only produced and consumed. This means that they do not stream all the dataset everytime, they just pass around resource locator strings. Your dataset gets streamed for analysis preprocessing speed reasons and then saved in small chunks as tfrecords for maximum performance, taking full advantage of the fast storage technology of Storage Buckets.

This is why when you declare custom components , you declare strongly typed input and output channels (artifact types and names), which get mapped to multiple, tagged input-outputs on the beam side . You return these with a Dict . Feel free to look into the source of the default TFX Components for more insights on these

This is why you need to do things like:

example_gen = CsvExampleGen(...)

statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

LSTM and GRU: A step further into the world of Gated RNN’s

2. August 2019

Deep Learning; Personal Notes Part 1 Lesson 3: CNN theory; Convolutional filters, Max pooling…

21. August 2018

How Do Artificial Intelligence and Forecasting Systems Fight Coronavirus

13. April 2020

Artificial intelligence and Intellectual Property Rights – Lexology

16. June 2020

Request for deletion

About

MC.AI – Aggregated news about artificial intelligence

MC.AI collects interesting articles and news about artificial intelligence and related areas. The contributions come from various open sources and are presented here in a collected form.

The copyrights are held by the original authors, the source is indicated with each contribution.

Contributions which should be deleted from this platform can be reported using the appropriate form (within the contribution).

MC.AI is open for direct submissions, we look forward to your contribution!

Search on MC.AI

mc.ai aggregates articles from different sources - copyright remains at original authors

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Tensorflow Extended, ML Metadata and Apache Beam on the Cloud

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Facebook效应

[美] 大卫·柯克帕特里克 / 沈路、梁军、崔筝 / 华文出版社 / 2010-10 / 49.80

本书作者近距离地采访了与Facebook相关的人士，其中包括Facebook的创始人、员工、投资人、意向投资人以及合作伙伴，加起来超过了130人。这是真切详实的访谈，更是超级精彩的故事。作者以其细腻的笔触，精巧的叙事结构，解密了Facebook如何从哈佛的宿舍里萌发，创始人的内讧，权力之争，如何放弃华盛顿邮报的投资，怎样争取到第一个广告客户，而第一轮融资又如何获得一亿美元的估值，让人痴迷的图片产品......一起来看看《Facebook效应》这本书的介绍吧!

码农工具

Tensorflow Extended, ML Metadata and Apache Beam on the Cloud