Fast, parallel applications with WebAssembly SIMD

栏目: IT技术 · 发布时间: 4年前

内容简介:SIMD stands forThe WebAssembly SIMD proposal defines a portable, performant subset of SIMD operations that are available across most modern architectures. This proposal derived many elements from theThe high-level goal of the WebAssembly SIMD proposal is t

SIMD stands for Single Instruction, Multiple Data . SIMD instructions are a special class of instructions that exploit data parallelism in applications by simultaneously performing the same operation on multiple data elements. Compute intensive applications like audio/video codecs, image processors, are all examples of applications that take advantage of SIMD instructions to accelerate performance. Most modern architectures support some variants of SIMD instructions.

The WebAssembly SIMD proposal defines a portable, performant subset of SIMD operations that are available across most modern architectures. This proposal derived many elements from the SIMD.js proposal , which in turn was originally derived from the Dart SIMD specification. The SIMD.js proposal was an API proposed at TC39 with new types and functions for performing SIMD computations, but this was archived in favor of supporting SIMD operations more transparently in WebAssembly. The WebAssembly SIMD proposal was introduced as a way for browsers to take advantage of the data level parallelism using the underlying hardware.

WebAssembly SIMD proposal

The high-level goal of the WebAssembly SIMD proposal is to introduce vector operations to the WebAssembly Specification, in a way that guarantees portable performance.

The set of SIMD instructions is large, and varied across architectures. The set of operations included in the WebAssembly SIMD proposal consist of operations that are well supported on a wide variety of platforms, and are proven to be performant. To this end, the current proposal is limited to standardizing Fixed-Width 128-bit SIMD operations.

The current proposal introduces a new v128 value type, and a number of new operations that operate on this type. The criteria used to determine these operations are:

  • The operations should be well supported across multiple modern architectures.
  • Performance wins should be positive across multiple relevant architectures within an instruction group.
  • The chosen set of operations should minimize performance cliffs if any.

The proposal is in active development, both V8 and the toolchain have working prototype implementations for experimentation. As these are prototype implementations, they are subject to change as new operations are added to the proposal.

Using WebAssembly SIMD

Enabling experimental SIMD support in Chrome

WebAssembly SIMD support is prototyped behind a flag in Chrome, to try out the SIMD support on the browser, pass --enable-features=WebAssemblySimd , or toggle the "WebAssembly SIMD support" flag in chrome://flags . This work is bleeding edge, and continuously being worked on. To minimize the chances of breakage, please use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please file a bug .

Building C / C++ to target SIMD

WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the latest-upstream distribution of emscripten using emsdk to use the bleeding edge SIMD features.

./emsdk install latest-upstream

./emsdk activate latest-upstream

There are a couple of different ways to enable generating SIMD code when porting your application to use SIMD. Once the latest upstream emscripten version has been installed, compile using emscripten, and pass the -msimd128 flag to enable SIMD.

emcc -msimd128 -O3 foo.c -o foo.js

Applications that have already been ported to use WebAssembly may benefit from SIMD with no source modifications thanks to LLVM’s autovectorization optimizations.

These optimizations can automatically transform loops that perform arithmetic operations on each iteration into equivalent loops that perform the same arithmetic operations on multiple inputs at a time using SIMD instructions. LLVM’s autovectorizers are enabled by default at optimization levels -O2 and -O3 when the -msimd128 flag is supplied.

For example, consider the following function that multiplies the elements of two input arrays together and stores the results in an output array.

void multiply_arrays(int* out, int* in_a, int* in_b, int size) {
  for (int i = 0; i < size; i++) {
    out[i] = in_a[i] * in_b[i];
  }
}

Without passing the -msimd128 flag, the compiler emits this WebAssembly loop:

(loop
  (i32.store
    … get address in `out` …
    (i32.mul
      (i32.load … get address in `in_a` …)
      (i32.load … get address in `in_b` …)
  …
)

But when the -msimd128 flag is used, the autovectorizer turns this into code that includes the following loop:

(loop
  (v128.store align=4
    … get address in `out` …
    (i32x4.mul
       (v128.load align=4 … get address in `in_a` …)
       (v128.load align=4 … get address in `in_b` …)
    …
  )
)

The loop body has the same structure but SIMD instructions are being used to load, multiply, and store four elements at a time inside the loop body.

For finer grained control over the SIMD instructions generated by the compiler, include this header file , which defines a set of intrinsics. Intrinsics are special functions that, when called, will be turned by the compiler into the corresponding WebAssembly SIMD instructions, unless it can make further optimizations.

As an example, here is the same function from before manually rewritten to use the SIMD intrinsics.

#include <wasm_simd128.h>

void multiply_arrays(int* out, int* in_a, int* in_b, int size) {
  for (int i = 0; i < size; i += 4) {
    v128_t a = wasm_v128_load(&in_a[i]);
    v128_t b = wasm_v128_load(&in_b[i]);
    v128_t prod = wasm_i32x4_mul(a, b);
    wasm_v128_store(&out[i], prod);
  }
}

This manually rewritten code assumes that the input and output arrays are aligned and do not alias and that size is a multiple of four. The autovectorizer cannot make these assumptions and has to generate extra code to handle the cases where they are not true, so hand-written SIMD code often ends up being smaller than autovectorized SIMD code.

Compelling use cases

The WebAssembly SIMD proposal seeks to accelerate high compute applications like audio/video codecs, image processing applications, cryptographic applications, etc. Currently WebAssembly SIMD is experimentally supported in widely used open source projects like Halide , OpenCV.js , and XNNPACK .

Some interesting demos come from the MediaPipe project by the Google Research team.

As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a Web version , too!

One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a following hand-tracking system. Without SIMD, you can get only around 3 frames per second on a modern laptop, while with SIMD enabled you get a much smoother experience at 15-16 frames per second.

Visit the link in Chrome Canary with SIMD enabled to try it!

Another interesting set of demos that makes use of SIMD for smooth experience, come from OpenCV - a popular computer vision library that can also be compiled to WebAssembly. They’re available by link , or you can check out the pre-recorded versions below:

Card reading
Invisibility cloak
Emoji replacement

The current SIMD proposal is in Phase 2, so the future work here is to push the proposal forward in the standardization process. Fixed width SIMD gives significant performance gains over scalar, but it doesn’t effectively leverage wider width vector operations that are available in modern hardware. As the current proposal moves forward, some future facing work here is to determine the feasibility of extending the proposal with longer width operations.

To try out current experimental support, use the latest-upstream Emscripten toolchain, and a recent Chrome Canary as detailed. Please note that as the support is experimental, we are actively working on feature completion and performance so some breakage, or performance inconsistencies are possible.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机程序设计艺术(第2卷)

计算机程序设计艺术(第2卷)

Donald E. Knuth / 苏运霖 / 国防工业出版社 / 2002-8 / 98.00元

本书是国内外业界广泛关注的7卷本《计算机程序设计艺术》第2卷的最新版。本卷对半数值算法领域做了全面介绍,分“随机数”和“算术”两章。本卷总结了主要算法范例及这些算法的基本理论,广泛剖析了计算机程序设计与数值分析间的相互联系,其中特别值得注意的是作者对随机数生成程序的重新处理和对形式幂级数计算的讨论。 本书附有大量习题和答案,标明了难易程度及数学概念的使用。 本书内容精辟,语言流畅,引人入胜,可供从......一起来看看 《计算机程序设计艺术(第2卷)》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

SHA 加密
SHA 加密

SHA 加密工具