How would you like a 1000x speed increase

栏目: IT技术 · 发布时间: 4年前

内容简介:Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.To get back into blogging, I've decided to quickly write about a change I made some time ago already.This change

Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.

To get back into blogging, I've decided to quickly write about a change I made some time ago already.

This change was for the "instrumented profiler", i.e. the one that will at run-time change all the code of the user's program, in order to measure execution times and count up calls and allocations.

In order to get everything right, the instrumented profiler keeps an entire call graph in memory. If you haven't seen something like it yet, imagine taking stack traces at every point in your program's life, and all these stack traces put together make all the paths in the tree that point at the root.

This means, among other things, that the same function can come up multiple times. With recursion, the same function can in fact come up a few hundred times "in a row". In general, if your call tree can become both deep and wide, you can end up with a whole truckload of nodes in your tree.

How would you like a 1000x speed increase
Photo by Gabriel Garcia Marengo / Unsplash

Is it a bad thing to have many nodes? Of course, it uses up memory. Only a single path on the tree is ever interesting at any one moment, though. Memory that's not read from or written to is not quite as "expensive". It never has to go into the CPU cache, and is even free to be swapped out to disk and/or compressed. But hold on, is this really actually the case?

It turns out that when you're compiling the Core Setting, which is a code file almost 2½ megabytes big with about 71½ thousand lines, and you're profiling during the parsing process, the tree gets enormous. At the same time, the parsing process slows to a crawl. What on earth is wrong here?

Well, looking at what MoarVM spends most of its time doing while the profiler runs gives you a good hint: It's spending almost all of its time going through the entirety of the tree for garbage collection purposes. Why would it do that, you ask? Well, in order to count allocated objects at every node, you have to match the count with the type you're allocating, and that means you need to hold on to a pointer to the type, and that in turn has to be kept up to date if anything moves (which the GC does to recently-born things) and to make sure types aren't considered unused and thrown out.

That's bad, right? Isn't there anything we can do? Well, we have to know at every node which counter belongs to which type, and we need to give all the types we have to the garbage collector to manage. But nothing forces us to have the types right next to the counter. And that's already the solution to the problem:

Holding on to all types is now the job of a little array kept just once per tree, and next to every counter there's just a little number that tells you where in the array to look.

This increases the cost of recording an allocation, as you'll now have to go to a separate memory location to match types to counters. On the other hand, the "little number" can be much smaller than before, and that saves memory in every node of the tree.

More importantly, the time cost of going through the profiler data is now independent of how big the tree is, since the individual nodes don't have to be looked at at all.

With a task as big as parsing the core setting, which is where almost every type, exception, operator, or sub lives, the difference is a factor of at least a thousand. Well, to be honest I didn't actually calculate the difference, but I'm sure it's somewhere between 100x faster and 10000x faster, and going from "ten microseconds per tree node" to "ten microseconds per tree" isn't a matter of a single factor increase, it's a complexity improvement from O(n) to O(1). As long as you can find a bigger tree, you can come up with a higher improvement factor. Very useful for writing that blog post you've always wanted to put at the center of a heated discussion about dishonest article titles!

Anyway, on testing my patch, esteemed colleague MasterDuke had this to say on IRC:

timotimo: hot damn, what did you do?!?! stage parse only took almost twice as long (i.e., 60s instead of the normal 37s) instead of the 930s last time i did the profile

(psst, don't check what 930 divided by 60 is, or else you'll expose my blog post title for the fraud that it is!)

Well, that's already all I had for this post. Thanks for your attention, stay safe, wear a mask (if by the time you're reading this the covid19 pandemic is still A Thing, or maybe something new has come up), and stay safe!

How would you like a 1000x speed increase
Photo by Macau Photo Agency / Unsplash

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构与算法:python语言实现

数据结构与算法:python语言实现

迈克尔.·T·古德里奇、罗伯托·塔玛西亚、迈克尔·H·戈德瓦瑟 / 张晓、赵晓南 / 机械工业出版社 / 2018-9 / 109.00元

本书采用Python语言讨论数据结构和算法,详细讲解其设计、分析与实现过程,是一本内容全面且特色鲜明的教材。书中将面向对象视角贯穿始终,充分利用Python语言优美而简洁的特点,强调代码的健壮性和可重用性,关注各种抽象数据类型以及不同算法实现策略的权衡。一起来看看 《数据结构与算法:python语言实现》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具