Cloud Functions pro tips: Using retries to build reliable serverless systems

栏目: 后端 · 发布时间: 6年前

2018-11-17 admin GoogleCloud No comments

Source: Cloud Functions pro tips: Using retries to build reliable serverless systems from Google Cloud

By now, you’re probably familiar with Cloud Functions, Google Cloud’s event-driven serverless compute platform. You may have even written a few functions yourself. Cloud Functions runs code in response to events emitted by Google Cloud Platform (GCP) or third-party services and clients. This is one of the simplest ways to run your code in the cloud: there are no servers to think about, you just wire your code to events, and pay only when it runs.

In theory, any service you build on top of Cloud Functions will scale automatically and also benefit from Cloud Functions’ availability and fault-tolerance characteristics, for example, provisioning and autoscaling of your functions, and its production-grade SLA.

But does that mean you don’t need to think about reliability issues associated with large-scale distributed systems? The answer is, not completely. While there are many things that GCP and Cloud Functions handle behind the scenes, you still need to keep a couple of best practices in mind while building a reliable serverless solution. In this post, we’ll discuss one of the key aspects of building reliable production systems with Cloud Functions: retrying executions.

Cloud Functions pro tips: Using retries to build reliable serverless systems

Retrying executions in Cloud Functions

The first thing you need to accept is that in distributed systems, failures happen. In fact, they are an inherent part of complex distributed systems, and it’s practically impossible to build one without them. To learn more about fault tolerance in distributed systems, check out Distributed systems: principles and paradigms book by A. S. Tanenbaum and M. van Steen.

You can, however, follow system-design best practices on how to handle failures gracefully. GCP helps in many ways, for example, by handling hardware failures and replicating your data. When you call the Cloud Vision API to identify an image, GCP makes sure that the service is available and can handle your request; when you put an object into a Cloud Storage bucket, Google makes sure it is persisted, using replication. But the overall reliability of your software running in the cloud must be a shared priority between you and the cloud provider.

As mentioned, functions act on events. They are particularly useful for connecting different systems, for example processing incoming data and uploading it to a storage system. In a perfect world, everything works as planned—a function receives data and persists it, for example in a database.

Unfortunately, there are lots of ways a function can fail. For example, it could run out of memory, there could be an external network connection issue, or a dependent third-party service may respond with a transient error. All these issues may result in data loss if the data doesn’t make it to the storage system.

In a serverless environment, the simplest way to handle such failures is to retry executing the function, preferably automatically. (This is certainly better than asking your users to retry the failed requests to your system.)

HTTP vs. background functions

With Cloud Functions, all functions are typically invoked once for each request or event—unless an error occurs. Then, Cloud Functions offers different execution guarantees depending on what type of function you are using: an HTTP function or a background function.

HTTP functions are invoked by standard HTTP requests, which wait for an HTTP response from the function.

Background functions are invoked by cloud events, such as messages on a Cloud Pub/Sub topic, or changes in a Cloud Storage bucket.

HTTP functions are invoked at most once , due to their synchronous nature. This means that Cloud Functions does not retry HTTP calls—the caller must handle any errors. This is as it should be, given that in the case of a network failure, an HTTP request might not even reach Google’s infrastructure in the first place.

On the other hand, background functions are invoked at least once because Cloud Functions can retry asynchronous function invocations triggered by events (as there is no caller who could do it otherwise).

Let’s look at some examples. Here is an HTTP function that uploads the incoming HTTP request body to another service, by sending a POST request to it. Simple, right?

The problem is that the dependent system could fail because of a network connectivity issue, for example. This would result in function execution failure, and potential data loss if the function was not retried.

In case of HTTP functions, the right approach is for the code that calls the function to perform the retries. To implement retries, you can either write custom retry logic, or reuse a common library, such as ‘promise-retry’ in the following sample code. Here, a rejected Promise results in retried calls to the function.

Now, here’s an example of a background function, which also uploads data to an external service. In this case, the data is extracted from a Cloud Pub/Sub message.

This background function has the same problem as our HTTP function—if the call to the dependent service fails, the function execution will fail, as we return the Promise as function result.

To retry background functions, you can usually just enable the ‘retry on failure’ option when deploying your function through the Cloud SDK or Cloud Console. This way, function execution is automatically retried for a given event until it completes successfully. The first retry is triggered almost immediately, and subsequent retries are performed according to the exponential backoff policy, capped at 10 second intervals, for up to seven days. You should be careful when using this option, though, because the function will be retried even if it crashes, returns an error because of a bug in your code, or because of unexpected event data. Therefore, you may want to make your function return an error only on transient issues (such as 503 Service Unavailable status code returned from an external service), and handle permanent problems (such as invalid input data) completely inside the function.

Overall, retries can by applied in multiple places. As in our examples, the caller can retry requests to an HTTP function, while background functions can be configured to automatically be retried on failure. You can also retry calls to dependent systems from inside a function. You can do this by configuring client libraries (look at the options they offer), or by writing custom code. Finally, be sure to distinguish persistent errors from transient failures, and apply retries only on the latter.

Gracefully handling failures with retries is a key aspect of building resilient, reliable serverless code. To help you get started, visit cloud.google.com/functions/ and you can also find all the code we used in this blog post on GitHub . Then, stay tuned for our next blog post, where we look at what to do when retrying a function isn’t enough.

除非特别声明，此文章内容采用知识共享署名 3.0 许可，代码示例采用 Apache 2.0 许可。更多细节请查看我们的服务条款。

Tags: Cloud

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Cloud Functions pro tips: Using retries to build reliable serverless systems

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Invisible Users

Jenna Burrell / The MIT Press / 2012-5-4 / USD 36.00

The urban youth frequenting the Internet cafes of Accra, Ghana, who are decidedly not members of their country's elite, use the Internet largely as a way to orchestrate encounters across distance and ......一起来看看《Invisible Users》这本书的介绍吧!

码农工具