Machine learning is still too hard to use

栏目: IT技术 · 发布时间: 4年前

内容简介：Take something ubiquitous in software, like a database. What does it mean to build one? To a Postgres contributor, “creating a database” looks like a million lines of C. To a Rails dev, it looks likeObviously, neither are wrong, they just represent differe

But things are starting to get easier

Caleb Kaiser

Mar 26 ·6min read

Machine learning is still too hard to use — Source: Pexels

Disclaimer: The following is based on my observations of machine learning teams — not an academic survey of the industry. For context, I’m a contributor to Cortex , an open source platform for deploying models in production.

Take something ubiquitous in software, like a database. What does it mean to build one? To a Postgres contributor, “creating a database” looks like a million lines of C. To a Rails dev, it looks like rake db:create .

Obviously, neither are wrong, they just represent different levels of abstraction, appropriate for the different focuses of each engineer.

This is how software builds on itself. The basic software that powers modern applications—databases, web servers, request routers, hashing libraries, etc.—become widespread in no small part due to the layers of abstraction that make them accessible to non-specialists.

Machine learning has historically lacked that layer of abstraction, limiting its adoption. Now, however, things are changing. There is a new wave of projects focused specifically making applied machine learning easier.

Models need a developer-friendly interface

In order to use machine learning in production, you need:

The expertise to design your model
Enough data and funding to train your model
The ML infrastructure knowledge to deploy your model

Any project using ML, as a result, needs to be staffed by several specialists. This is a bottleneck that has to be removed.

It should to be possible for a developer with little background in machine learning to use it in production, just as a developer with no background in cryptography can still apply hashing libraries to secure user data.

Fortunately, this is finally happening.

Bridging the machine learning abstraction gap

In order for applied ML to become widespread, a developer must be able to take a high level understanding of machine learning—what is a model, fine tuning, inference, etc.—and using available abstractions, build an app.

Many of the necessary abstractions are already being worked on, and they fall into a few key areas of focus:

1. There needs to be an easier way to train models

The reality is that for many of applied machine learning’s use cases, there is no need to train a new model from scratch.

For example, if you are developing a conversational agent, Google’s Meena is almost certainly going to outperform your model. If you’re developing a text generator, you should use OpenAI’s GPT-2 instead of building your own from scratch. For object detection, a model like YOLOv3 is probably your best bet.

Thanks totransfer learning—a process in which the “knowledge” of a neural network is fine tuned to a new domain—you can take a relatively small amount of data and fine tune these open source, state-of-the-art models to your task.

For example, with new libraries like gpt-2-simple , you can fine tune GPT-2 using a simple command line interface:

$ gpt_2_simple finetune your_custom_data.txt

With this layer of abstraction, developers don’t need deep ML expertise—they just need to know what fine tuning is.

And gpt-2-simple is far from the only training abstraction available. Google’s Cloud AutoML gives users a GUI that allows them to select their dataset and automatically train a new model, no code necessary:

Writing about AutoML , Sundar Pichai said “We hope AutoML will take an ability that a few PhDs have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs.”

2. Generating predictions from models needs to be simple

Okay, so it’s easier to get a trained model for your particular task. How do you generate predictions from that model?

There are a ton of projects which offer model serving functionality, many of which are connected to popular ML frameworks. TensorFlow, for example, has TF Serving, and ONNX has ONNX Runtime.

Outside of the tech giants, there are also a number of independent open source projects working on this problem. For example, Bert Extractive Summarizer is a project that makes it easy to extract summaries of text using Google’s BERT. Below is an example from the docs :

from summarizer import Summarizer

body = 'Text body that you want to summarize with BERT'
body2 = 'Something else you want to summarize with BERT'
model = Summarizer()
model(body)
model(body2)

Generating a prediction with the library is as simple as an import statement and a call to Summarizer() .

As more projects like these continue to launch and develop, it becomes easier for developers to generate predictions from models without having to dig into the model itself.

3. Deploying models needs to be simple

The final bottleneck is infrastructure.

Serving predictions for a toy application is straightforward, but when your application needs to scale, things get difficult. Using GPT-2 as an example:

GPT-2 is > 5 GB . You need a larger—and by definition, more expensive—server to host this big of a model.
GPT-2 is compute hungry . In order to serve a single prediction, GPT-2 can occupy a CPU at 100% utilization for several minutes. Even with a GPU, a single prediction can still take seconds. Compare this to a web app, which can serve hundreds of concurrent users with one CPU.
GPT-2 is memory hungry . Beyond its considerable disk space and compute requirements, GPT-2 also needs large amounts of memory to run without crashing.

In order to handle even a small surge in users, your infrastructure would need to scale up many replicas of your application. This means containerizing your model with Docker, orchestrating your containers with Kubernetes, and configuring your autoscaling with whatever cloud platform you use.

Building the infrastructure to handle machine learning deployments requires learning an entire stack of tools, many of which will not be familiar to most developers who don’t have devops backgrounds:

In order for machine learning to become accessible to developers, machine learning infrastructure needs to also be abstracted. This is where projects like Cortex (full disclosure: I’m a contributor) come in.

Cortex abstracts away the underlying devops of model deployment with a config file and a CLI:

The goal of projects like Cortex is simple: Take a trained model, and turn it into a prediction API that any developer can use.

Making applied machine learning easier

Let me be clear, the underlying math behind machine learning will always be hard. No one is a machine learning expert just because they can call a predict() function. The point is that a developer shouldn’t have to be a machine learning expert (or devops expert, for that matter) to use ML in their application.

The machine learning ecosystem is finally focusing on making applied ML easier. A developer with just a little knowledge can fine tune a state-of-the-art model, wrap it in an API, and deploy it on scalable infrastructure using open source, intuitive abstractions.

As a result, applied machine learning is about to become easier—and by extension, accessible to virtually all developers.

以上所述就是小编给大家介绍的《Machine learning is still too hard to use》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Machine learning is still too hard to use

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

同伦方法纵横谈

王则柯 / 大连理工大学 / 2011-5 / 25.00元

《走向数学丛书07-同伦方法纵横谈》，在本书里读者会看到许多人物故事，作为一本普及读物，我们有时候甚至觉得，对于不少读者来说，书中所写的科学研究中的人物故事，可能比书中介绍的具体的研究成果更有价值，这些人物故事，许多都出自我们个人之间的交往，这是从一个侧面了解科学研究的规律，了解科学家之成为科学家的珍贵记录。一起来看看《同伦方法纵横谈》这本书的介绍吧!

码农工具