内容简介:Every few months, a new benchmark seems to be set by machine learning research teams. Looking at natural language processing, for example:That’s how quickly the state-of-the-art moves in machine learning research. This frenetic pace, however, has not neatl
At least, you can build software with machine learning
Mar 11 ·5min read
Every few months, a new benchmark seems to be set by machine learning research teams. Looking at natural language processing, for example:
- In February 2019, OpenAI announced GPT-2, a state-of-the-art NLP model trained with 1.5 billion parameters.
- By September, Salesforce had released an even bigger NLP model in CTRL, which had 1.63 billion parameters.
- In January 2020, Google then released an even bigger NLP model called Meena, which was supposed to be the best conversational agent yet.
- One month after Meena set new records, Microsoft released Turing-NLG, an insane 17 billion parameter NLP model.
That’s how quickly the state-of-the-art moves in machine learning research. This frenetic pace, however, has not neatly translated to a surge in new ML applications.
The reason for this is that while much energy has been spent over the last decade on machine learning research, less has gone into building tools and abstractions that make it easy for engineers to build ML applications.
In order for a new generation of ML applications to be unlocked, state-of-the-art machine learning models need to become as accessible to software engineers as any other library—and fortunately, we’re starting to see that happen.
Abstraction is how software gets built
There is hardly a web app that can exist without authentication, and almost all authentication schemes involve the hashing of passwords. Most web developers (ideally) understand why password hashing is necessary from a security perspective, and have experience implementing it.
But how many of those web developers write their own hashing functions?
Virtually none is the answer. Instead, they use a hashing library to abstract away the underlying cryptography, and focus on building their app. For example, instead of writing hundreds of lines of code to hash a password, an engineer writes:
import bcrypthashed_pass = bcrypt.hashpw(pass, salt)
All of this probably feels obvious—of course, you don’t have to write a hashing function from scratch—but as the production machine learning ecosystem is young, this layer of abstraction is still being defined.
As a result, there is still a disconnect between advances in machine learning research and the ability of software engineers to turn those advances into products.
But that is changing.
The ML community is bridging the abstraction gap
Returning to the hashing example, what libraries like bcrypt
did was give software engineers an interface that allowed them to treat complex hashing operations as simple, high-level functions.
The ML community is starting to do the same with prediction serving. We are seeing more projects dedicated to building an interface such that software engineers can treat a trained model as a predict()
function. Instead of treating GPT-2, for example, as a highly complex transformer model, engineers can conceptualize it as a just a GPT2_predict()
function that takes an input string, and returns an output string.
One of the most popular ways to build this interface is to deploy a trained model as a microservice. The predict()
function that engineers interface with then becomes a simple wrapper around the model’s API.
There are a few popular open source platforms like TF Serving and ONNX Runtime that provide an easy interface for generating predictions from a model, but deploying a model to the cloud still presents particular infrastructure challenges:
- Models can be huge. GPT-2, OpenAI’s popular NLP model, is over 5 GBs.
- Predictions are computationally expensive . Even with GPUs, many models take seconds—even minutes—to generate a prediction.
- Concurrency is a pain. A single prediction can fully utilize an instance, meaning instances need to autoscale aggressively to handle increases in traffic.
To handle these challenges, an engineer would need to wrangle tools like Flask, Docker, Kubernetes, and whatever APIs their cloud platform provides. They would have to become versed in a ML-specific DevOps, in other words.
There are several projects working on providing a layer of abstraction over this infrastructure. For example, Cortex , a project I contribute to, is focused on abstracting all of this away by converting trained models into scalable APIs with a CLI and a config file:
As projects like these mature, machine learning comes closer to just being a library engineers import to build their application, and as a consequence, we come closer to seeing a flood of new ML-powered software.
This trend isn’t pure speculation, either. It’s something we’re already seeing.
ML-native apps are machine learning’s CRUD apps
In understanding the impact that closing the abstraction gap will have, it’s useful to look at how abstractions unlocked a new generation of web apps.
When you look at web applications, how many of them at their core are “just” simple CRUD apps? How many applications primarily store and modify user data, operations abstracted away by ORMs, and display that data to authorized users, a process which is also abstracted away by hashing and authentication libraries?
A similar dynamic seems to be happening within machine learning. More and more, startups are launching products whose core functionality—or at least, a major part of their core functionality—is to serve predictions from a trained model.
We refer to these products as ML-native, and in aprevious article, I put together a list of ML-native startups by looking just at computer vision products:
Take computer vision models:
Ezra , Zebra Medical , and Arterys are all startups that use computer vision models to analyze MRIs for anomalies.
SkinVision , SkinIQ , and TroveSkin all use your phone’s camera and a computer vision model to analyze your skin for everything from acne to melanoma.
Comma.ai , Pony.ai , and Phantom.ai all use computer vision models to help cars navigate autonomously.
Actuate (formerly Aegis AI), Athena Security , and Synapse Technology all use computer vision models to detect weapons in video footage.
As the abstraction gap continues to close in machine learning, there will be an explosion of ML-native products, similar to the wave of CRUD apps that hit the market as web frameworks made it easier for engineers to build them.
Machine learning’s future relies on both researchers and engineers
Going back to the hashing parallel one last time (I promise), it’s important to note how the efforts of researchers and software engineers interplay to push the field forward.
In the web development world, most of the popular abstractions for authentication—be it built-in functionality of frameworks like Django and Rails, or dedicated projects like Passport.js—are built on bcrypt
. The people who designed bcrypt
, Niels Provos and David Mazières, are both security researchers by profession.
In this example, the work done by dedicated researchers pushes the state-of-the-art forward, and is then wrapped up in a layer of abstraction that makes this new frontier available to engineers, unlocking a new wave of software.
The same dynamic has emerged within machine learning. Every time OpenAI, Google, Microsoft, or some other ML research team releases a new model, they’re really releasing new functionality that—given the right abstractions— engineers can use to build new products.
In other words, data scientists and researchers are focused on the fundamentals of machine learning, conducting experiments to train models to do things we’ve never seen before—and to engineers, these trained models will become just another library they import to build new products.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
最优化理论与算法
陈宝林 / 清华大学出版社 / 2005-10-1 / 46.00元
最优化理论与算法(第2版),ISBN:9787302113768,作者:陈宝林 编著一起来看看 《最优化理论与算法》 这本书的介绍吧!