Fast, parallel applications with WebAssembly SIMD

栏目: IT技术 · 发布时间: 4年前

内容简介:SIMD stands forThe WebAssembly SIMD proposal defines a portable, performant subset of SIMD operations that are available across most modern architectures. This proposal derived many elements from theThe high-level goal of the WebAssembly SIMD proposal is t

SIMD stands for Single Instruction, Multiple Data . SIMD instructions are a special class of instructions that exploit data parallelism in applications by simultaneously performing the same operation on multiple data elements. Compute intensive applications like audio/video codecs, image processors, are all examples of applications that take advantage of SIMD instructions to accelerate performance. Most modern architectures support some variants of SIMD instructions.

The WebAssembly SIMD proposal defines a portable, performant subset of SIMD operations that are available across most modern architectures. This proposal derived many elements from the SIMD.js proposal , which in turn was originally derived from the Dart SIMD specification. The SIMD.js proposal was an API proposed at TC39 with new types and functions for performing SIMD computations, but this was archived in favor of supporting SIMD operations more transparently in WebAssembly. The WebAssembly SIMD proposal was introduced as a way for browsers to take advantage of the data level parallelism using the underlying hardware.

WebAssembly SIMD proposal

The high-level goal of the WebAssembly SIMD proposal is to introduce vector operations to the WebAssembly Specification, in a way that guarantees portable performance.

The set of SIMD instructions is large, and varied across architectures. The set of operations included in the WebAssembly SIMD proposal consist of operations that are well supported on a wide variety of platforms, and are proven to be performant. To this end, the current proposal is limited to standardizing Fixed-Width 128-bit SIMD operations.

The current proposal introduces a new v128 value type, and a number of new operations that operate on this type. The criteria used to determine these operations are:

  • The operations should be well supported across multiple modern architectures.
  • Performance wins should be positive across multiple relevant architectures within an instruction group.
  • The chosen set of operations should minimize performance cliffs if any.

The proposal is in active development, both V8 and the toolchain have working prototype implementations for experimentation. As these are prototype implementations, they are subject to change as new operations are added to the proposal.

Using WebAssembly SIMD

Enabling experimental SIMD support in Chrome

WebAssembly SIMD support is prototyped behind a flag in Chrome, to try out the SIMD support on the browser, pass --enable-features=WebAssemblySimd , or toggle the "WebAssembly SIMD support" flag in chrome://flags . This work is bleeding edge, and continuously being worked on. To minimize the chances of breakage, please use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please file a bug .

Building C / C++ to target SIMD

WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the latest-upstream distribution of emscripten using emsdk to use the bleeding edge SIMD features.

./emsdk install latest-upstream

./emsdk activate latest-upstream

There are a couple of different ways to enable generating SIMD code when porting your application to use SIMD. Once the latest upstream emscripten version has been installed, compile using emscripten, and pass the -msimd128 flag to enable SIMD.

emcc -msimd128 -O3 foo.c -o foo.js

Applications that have already been ported to use WebAssembly may benefit from SIMD with no source modifications thanks to LLVM’s autovectorization optimizations.

These optimizations can automatically transform loops that perform arithmetic operations on each iteration into equivalent loops that perform the same arithmetic operations on multiple inputs at a time using SIMD instructions. LLVM’s autovectorizers are enabled by default at optimization levels -O2 and -O3 when the -msimd128 flag is supplied.

For example, consider the following function that multiplies the elements of two input arrays together and stores the results in an output array.

void multiply_arrays(int* out, int* in_a, int* in_b, int size) {
  for (int i = 0; i < size; i++) {
    out[i] = in_a[i] * in_b[i];
  }
}

Without passing the -msimd128 flag, the compiler emits this WebAssembly loop:

(loop
  (i32.store
    … get address in `out` …
    (i32.mul
      (i32.load … get address in `in_a` …)
      (i32.load … get address in `in_b` …)
  …
)

But when the -msimd128 flag is used, the autovectorizer turns this into code that includes the following loop:

(loop
  (v128.store align=4
    … get address in `out` …
    (i32x4.mul
       (v128.load align=4 … get address in `in_a` …)
       (v128.load align=4 … get address in `in_b` …)
    …
  )
)

The loop body has the same structure but SIMD instructions are being used to load, multiply, and store four elements at a time inside the loop body.

For finer grained control over the SIMD instructions generated by the compiler, include this header file , which defines a set of intrinsics. Intrinsics are special functions that, when called, will be turned by the compiler into the corresponding WebAssembly SIMD instructions, unless it can make further optimizations.

As an example, here is the same function from before manually rewritten to use the SIMD intrinsics.

#include <wasm_simd128.h>

void multiply_arrays(int* out, int* in_a, int* in_b, int size) {
  for (int i = 0; i < size; i += 4) {
    v128_t a = wasm_v128_load(&in_a[i]);
    v128_t b = wasm_v128_load(&in_b[i]);
    v128_t prod = wasm_i32x4_mul(a, b);
    wasm_v128_store(&out[i], prod);
  }
}

This manually rewritten code assumes that the input and output arrays are aligned and do not alias and that size is a multiple of four. The autovectorizer cannot make these assumptions and has to generate extra code to handle the cases where they are not true, so hand-written SIMD code often ends up being smaller than autovectorized SIMD code.

Compelling use cases

The WebAssembly SIMD proposal seeks to accelerate high compute applications like audio/video codecs, image processing applications, cryptographic applications, etc. Currently WebAssembly SIMD is experimentally supported in widely used open source projects like Halide , OpenCV.js , and XNNPACK .

Some interesting demos come from the MediaPipe project by the Google Research team.

As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a Web version , too!

One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a following hand-tracking system. Without SIMD, you can get only around 3 frames per second on a modern laptop, while with SIMD enabled you get a much smoother experience at 15-16 frames per second.

Visit the link in Chrome Canary with SIMD enabled to try it!

Another interesting set of demos that makes use of SIMD for smooth experience, come from OpenCV - a popular computer vision library that can also be compiled to WebAssembly. They’re available by link , or you can check out the pre-recorded versions below:

Card reading
Invisibility cloak
Emoji replacement

The current SIMD proposal is in Phase 2, so the future work here is to push the proposal forward in the standardization process. Fixed width SIMD gives significant performance gains over scalar, but it doesn’t effectively leverage wider width vector operations that are available in modern hardware. As the current proposal moves forward, some future facing work here is to determine the feasibility of extending the proposal with longer width operations.

To try out current experimental support, use the latest-upstream Emscripten toolchain, and a recent Chrome Canary as detailed. Please note that as the support is experimental, we are actively working on feature completion and performance so some breakage, or performance inconsistencies are possible.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

App研发录:架构设计、Crash分析和竞品技术分析

App研发录:架构设计、Crash分析和竞品技术分析

包建强 / 机械工业出版社 / 2015-10-21 / CNY 59.00

本书是作者多年App开发的经验总结,从App架构的角度,重点总结了Android应用开发中常见的实用技巧和疑难问题解决方法,为打造高质量App提供有价值的实践指导,迅速提升应用开发能力和解决疑难问题的能力。本书涉及的问题有:Android基础建设、网络底层框架设计、缓存、网络流量优化、制定编程规范、模块化拆分、Crash异常的捕获与分析、持续集成、代码混淆、App竞品技术分析、项目管理和团队建设等......一起来看看 《App研发录:架构设计、Crash分析和竞品技术分析》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试