PHP 8: JIT performance in real-life web applications

栏目: IT技术 · 发布时间: 5年前

内容简介:For those interested inthe JIT in PHP 8, I did some benchmarks for you in real-world web application scenario. Be aware that these benchmarks don't say anything about whether the JIT is useful or not, they only show whether it can improve the performance o

For those interested inthe JIT in PHP 8, I did some benchmarks for you in real-world web application scenario. Be aware that these benchmarks don't say anything about whether the JIT is useful or not, they only show whether it can improve the performance of your average web application, or not.

These benchmarks are run on my local machine. As so, they don't say anything about absolute performance gains, we're only able to make conclusions what kind of relative impact the JIT has on our code.

I'll be using one of my hobby projects , written in Laravel. Since these benchmarks were run on the first alpha version of PHP 8, I had to manually fix some deprecation warnings in Laravel's source code, all locally.

Finally: I'll be running PHP FPM, configured to spawn 20 child processes, and I'll always make sure to only run 20 concurrent requests at once, just to eliminate any extra performance hits on the FPM level. Sending these requests is done using the following command, with ApacheBench:

ab -n 100 -c 20 -l http://aggregate.stitcher.io.test:8081/discover

The JIT setup requires a section on its own. Honestly, this is one of the most confusing ways of configuring a PHP extension I've ever seen, and I'm afraid the syntax is here to stay, since we're too close to PHP 8's feature freeze for another RFC to make changes to it.

So here goes:

The JIT is enabled by specifying opcache.jit_buffer_size in php.ini . If this directive is excluded, the default value is set to 0, and the JIT won't run.

Next, there are several JIT control options, they are all stored in a single directive called opcache.jit and could, for example, look like this:

opcache.jit_buffer_size=100M
opcache.jit=1235

The RFC lists the meaning of each number. Mind you: this is not a bit mask, each number simply represents another configuration option. The RFC lists the following options:

O — Optimization level

0 don't JIT
1 minimal JIT (call standard VM handlers)
2 selective VM handler inlining
3 optimized JIT based on static type inference of individual function
4 optimized JIT based on static type inference and call tree
5 optimized JIT based on static type inference and inner procedure analyses
0 JIT all functions on first script load
1 JIT function on first execution
2 Profile on first request and compile hot functions on second request
3 Profile on the fly and compile hot functions
4 Compile functions with @jit tag in doc-comments

R — register allocation

0 don't perform register allocation
1 use local liner-scan register allocator
2 use global liner-scan register allocator

C — CPU specific optimization flags

0 none
1 enable AVX instruction generation

One small gotcha: the RFC lists these options in reverse order, so the first digit represents the C value, the second the R , and so on.

Anyways, the RFC proposes 1235 as the best default, it will do maximum jitting, profile on the fly, use a global liner-scan register allocator — whatever that might be — and enables AVX instruction generation.

In my benchmarks, I'll use several variations of JIT configuration, in order to compare the differences.

So let's start benchmarking!

Establishing a baseline

First it's best to establish whether the JIT is working properly or not. We know from the RFC that it does have a significant impact on calculating a Mandelbrot — something most of us probably don't do in our web apps.

So let's start with that example. I copied some mandelbrot code and accessed it via the same HTTP application I'll run the next benchmarks on. These are the results:

requests/second (more is better)
Mandelbrot without JIT 15.24
Mandelbrot with JIT 38.99

Great, it looks like the JIT is working! That's more than a two time performance increase. Let's more on to our first real-life comparison. We're going to start slow: the JIT configured with 1231 , and 100 MB of buffer size.

The page we're benchmarking shows an overview of posts, so there's some recursion happening, and were touching several core parts of the Laravel as well: routing, DI, ORM, authentication.

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1231 , 100M buffer) 6.33

Hm. A decrease in performance enabling the JIT? Sure that's possible! What theJIT does is look at the code when its executing, discover "hot" parts of the code, and optimise those for the next run as machine code.

With the current configuration, analysing the code will happen on the fly, on every request. If there's little or no code to optimise, it's natural that there will be a performance price to pay.

So let's test a different setup, the one the RFC proposes as the most optimal one: 1235

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75

Here we see an increase, albeit a teeny-tiny one. Turns out there were some parts that could be optimised, and their performance gain outweighed the performance cost.

There's two more things to test: what if we don't profile on every request, but instead only at the start, that's what the T option is for: 2 — Profile on first request and compile hot functions on second request .

In other words, let's use 1225 as the JIT option.

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75
JIT enabled ( 1225 , 100M buffer) 6.78

Once again a — small is an understatement — increase of performance!

One thing I'm wondering though: if we're only profiling on the first request, there probably are some parts of the code that will be missed out on optimisations; that's something someone will probably need to do some more research on.

So I suspect using 1225 in benchmarks has a positive impact because we're always requesting the same page, but in practice this probably will be a less optimal approach.

Finally, let's bump the buffer limit. Let's give the JIT a little more room to breath with 500MB of memory:

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75
JIT enabled ( 1235 , 500M buffer) 6.52

A slight decrease in performance. One I cannot explain to be honest. I'm sure someone smarter than me can provide us the answer though!

So, that concludes my JIT testing. As expected: the JIT probably won't have a significant impact on web applications, at least not right now.

I'm won't discuss my thoughts on whether the JIT itself is a good addition or not in this post, let's have those discussions together on social media !


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

点石成金

点石成金

[美] 克鲁格 (Steve Krug) / 蒋芳 / 机械工业出版社 / 2015-1-1 / CNY 59.00

《点石成金:访客至上的Web和移动可用性设计秘笈(原书第3版)》是一本关于Web设计原则而不是Web设计技术的书。《点石成金:访客至上的Web和移动可用性设计秘笈(原书第3版)》作者是Web设计专家,具有丰富的实践经验。他用幽默的语言为你揭示Web设计中重要但却容易被忽视的问题,只需几个小时,你便能对照书中讲授的设计原则找到网站设计的症结所在,令你的网站焕然一新。一起来看看 《点石成金》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换