Open source parallel processing for Gatsby

栏目: IT技术 · 发布时间: 4年前

内容简介:To help the greater Gatsby ecosystem shorten the time it takes from commit to deploy, today I've just submitted a freshly bakedThe newGatsby Cloud pioneered this approach, and with this plugin I hope that we can open an even more generalized approach to pa

To help the greater Gatsby ecosystem shorten the time it takes from commit to deploy, today I've just submitted a freshly baked Pull Request as my first larger contribution to the Gatsby open source project.

The new gatsby-parallel-runner plugin builds on some of the existing work in the Gatsby project to allow both plugins and core parts of Gatsby to parallelize certain tasks by delegating the work to a large pool of serverless functions.

Gatsby Cloud pioneered this approach, and with this plugin I hope that we can open an even more generalized approach to parallelization. While it's in early development, my goal is that this can one day be made available on any CI/CD environment, and empower individual plugin developers in the ecosystem to build in support for parallelization when a task is well suited for it.

Gatsby and "External Jobs"

A bit under a month ago Ward Peeters from Gatsby's team landed a Pull Request to "enable external jobs with ipc". The idea behind the pull request is that instead of running gatsby build directly, an orchestrating parent process can fork gatsby build and make sure an environment variable called ENABLE_GATSBY_EXTERNAL_JOBS is set. When that is done certain jobs will be sent via node's IPC protocol to the parent process as "external jobs" in order to allow the parent orchestrater to efficiently parallelize the execution of them.

In itself nodejs is single threaded, so out of the box any CPU intensive jobs in Gatsby will only ever take advantage of one CPU core, but this ipc based delegation opens up the possibility of taking advantage of external worker processes for parallelization.

The only plugin that currently hooks into this is gatsby-plugin-sharp which is used for image transformation. At Netlify we've often seen image transformations be a huge source of build slowdowns for Gatsby sites, since doing lots of image processing within a single threaded build is really inefficient.

The open source Gatsby process has until now not offered any implementation of an orchestrator for Gatsby that can help you take advantage of external jobs, and that's where our new gatsby-parallel-runner helps.

Gatsby Parallel Runner

I've been having fun building out this new Gatsby plugin that acts as an alternative build command. Once installed in a project, you'll run gatsby-parallel-runner instead of gatsby build .

Out of the box it comes with a parallelized implementation of the Sharp image plugin based on Google Cloud Functions. It includes an easy script to get your own functions and queues setup in a Google Cloud project with just one command. It's built to be extensible so the community can easily add alternative implementations of the actual execution layer. Obvious candidates would be AWS Lambda functions or a nodejs cluster implementation. It should also be easy to use with new plugins that want to add external jobs outside of just image processing.

My hope is that this can help pave the way for more innovations in the ecosystem around build parallelization – and of course we're seeing a lot of opportunity in adding a more generalized form of this to Netlify's own build layer.

Our philosophy has always been to keep the build layer fundamentally open - our core build image has always been Open Source as is our new Build Plugin layer, and we've always believed that a healthy Open Source ecosystem in the build tool space is vital to the growth of the whole JAMstack category. So we're happy to contribute this project back to the Open Source community.

I ran a few benchmarks with the official Gatsby image benchmark repository on Netlify's build environment both with and without the gatsby-parallel-runner and was thrilled to see the gatsby-parallel-runnner consistently outperform the normal gatsby build command:

Running 3 times from a clear cache with gatsby build :

Run 1:
11:10:27 PM: success Generating image thumbnails - 351.218s - 3234/3234 9.21/s

Run 2:
1:45:43 PM: success Generating image thumbnails - 384.171s - 3234/3234 8.42/s

Run 3:
11:18:22 PM: success Generating image thumbnails - 322.853s - 3234/3234 10.02/s

Avg time for image generation: 352.747s

Running 6 times from a clear cache with gatsby-parallel-runner :

Run 1:
10:51:31 PM: success Generating image thumbnails - 158.438s - 3234/3234 20.41/s

Run 2:
5:33:33 PM: success Generating image thumbnails - 68.016s - 3234/3234 47.55/s

Run 3:
3:03:48 AM: success Generating image thumbnails - 75.731s - 3234/3234 42.70/s

Run 4:
10:54:47 PM: success Generating image thumbnails - 64.478s - 3234/3234 50.16/s

Run 5:
10:58:31 PM: success Generating image thumbnails - 66.021s - 3234/3234 48.98/s

Run 6:
11:01:58 PM: success Generating image thumbnails - 71.416s - 3234/3234 45.28/s


Avg time for image generation: 84.017s

The first run after deploying the functions was a bit slower than the subsequent runs, as Google worked on scaling up the number of concurrent function executions, but even then it was still more than twice as fast in the worst case as the standard Gatsby build command.

And on average - even with the initial outlier included - the parallel runner gave more than a 4.2x speedup over the single threaded runtime.

For curiosity I repeated the same benchmark on Gatsby Cloud:

Run 1:
03:02:38 AM: success Generating image thumbnails - 98.472s - 3234/3234 32.84/s

Run 2:
07:34:53 AM: success Generating image thumbnails - 328.141s - 3234/3234 9.86/s

Run 3:
22:19:50 PM: success Generating image thumbnails - 85.101s - 3234/3234 38.00/s

Run 4:
22:34:22 PM: success Generating image thumbnails - 134.721s - 3234/3234 24.01/s

Run 5:
23:02:37 PM: success Generating image thumbnails - 82.822s - 3234/3234 39.05/s

Run 6:
23:07:31 PM: success Generating image thumbnails - 60.532s - 3234/3234 53.43/s


Avg time for image generation: 131.631s

These tests had a lot more variability in build times than my Netlify based tests, and while the average was more than twice as fast as the single threaded build performance, the open source parallel runner performed significantly better in the tests I ran. So hopefully the Gatsby Cloud team can also benefit from looking into the source code behind this implementation.

Setting Up

Install in your gatsby project:

npm i gatsby-parallel-runner

To use with Google Cloud, set relevant env variables in your shell:

export GOOGLE_APPLICATION_CREDENTIALS=~/path/to/your/google-credentials.json

export TOPIC=parallel-runner-topic

Deploy the cloud function:

npx gatsby-parallel-runner deploy

Then run your Gatsby build with the parallel runner instead of the default gatsby build command.

npx gatsby-parallel-runner

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

应用Rails进行敏捷Web开发

应用Rails进行敏捷Web开发

Dave Thomas, David Hansson等 / 林芷薰 / 电子工业出版社 / 2006-7 / 65.00元

这是第一本关于Ruby on Rails的著作。 全书主要内容分为两大部分。在“构建应用程序”部分中,读者将看到一个完整的“在线购书网站” 示例。在演示的过程中,作者真实地再现了一个完整的迭代式开发过程,让读者亲身体验实际应用开发中遇到的各种问题、以及Rails如何有效解决这些问题。在随后的“Rails框架”部分中,作者深入介绍了Rails框架的各个组成部分。尤为值得一提的是本部分的后几章......一起来看看 《应用Rails进行敏捷Web开发》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具