内容简介:But as OCaml 4.10.0 has now hit the shelves, a very exciting feature isAt OCamlPro, some of the tools that we develop, such as the
The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.
But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog :
- #8809, #9292: Add a best-fit allocator for the major heap; still experimental, it should be much better than current allocation policies (first-fit and next-fit) for programs with large heaps, reducing both GC cost and memory usage. This new best-fit is not (yet) the default; set it explicitly with OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want to increase the `space_overhead` parameter of the GC (a percentage, 80 by default), for example OCAMLRUNPARAM="o=85", for optimal speed. (Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan, Xavier Leroy, Leo White)
At OCamlPro, some of the tools that we develop, such as the Opam installer, theAlt-Ergo SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.
Minor heap and Major heap: the GC in a nutshell
Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a Generational Garbage Collector with two spaces:
- The minor heap uses the Stop-and-copy principle. It is fast but has to stop the computation to perform a full iteration.
- The major heap uses the Mark-and-sweep principle. It has the perk of being incremental and behaves better for long-lived data.
Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called fragmentation , and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:
- Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!
- Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.
Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.
Where and how to allocate: the 3 strategies
OCaml used to propose 2 free-list allocation strategies: next-fit , the default, and first-fit . Version 4.10 of OCaml introduces the new best-fit strategy. Let’s compare them:
Next-fit, the original and remaining champion
OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:
- Keep a (circular) list of every hole in memory ordered by increasing addresses;
- Have a pointer on an element of that list;
- When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
- Otherwise, try the next hole and so-on.
This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.
First-fit, the unsuccessful contender
To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):
- Same idea as next-fit, but with an extra allocation table.
- Put the pointer back at the beginning of the list for each allocation.
- Use the allocation table to skip some parts of the list.
Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.
Best-fit: a new challenger enters!
This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.
- On small data (up to 32 words), segregated free lists will allow allocation in (mostly) constant time.
- On big data, a general best-fit allocator based on splay trees .
This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.
How will best-fit fare? Let’s find out!
Try it!
First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.
That being said, we’d advise you don’t use it in production code yet.
Why you should try it
Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback you give means the more the future GC will be tuned for your needs.
In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.
Choosing the allocator strategy
Now, there are two ways to control the GC behavior: through the code or through environment variables.
First method: Adding instructions in your code
This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:
let () = Gc.(set { (get()) with allocation_policy = 2; (* Use the best-fit strategy *) space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *) })
You might also want to add verbose = 0x400;
or verbose = 0x404;
in order to get some GC debug information. See here for more details on how to use the GC
module.
Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.
Second method: setting $OCAMLRUNPARAM
At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called Opam , maybe you’ve heard of it? Though most commands take a few seconds, some administrative-heavy commands can be a strain on our computer. In other words: those are perfect for a benchmark.
Here’s what we did to benchmark Opam:
$ opam update $ opam switch create 4.10.0 $ opam install opam-devel # or build your own code $ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404' $ cd my/local/opam-repository $ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick
If you want to compile and run your own benchmarks, here are a few details on OCAMLRUNPARAM
:
-
b=1
means “print the backtrace in case of uncaught exception” -
a=2
means “use best-fit” (default is0
, first-fit is1
) -
o=100
means “do less work” (default is80
, lower means more work) -
v=0x404
means “have the gc be verbose” (0x400
is “print statistics at exit”, 0x4 is “print when changing heap size”)
See the manual for more details on OCAMLRUNPARAM
You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).
Our results on opam
Our contribution in this article is to benchmark opam
with the different allocation strategies:
Strategy: | Next-fit | First-fit | Best-fit | ||
Overhead: | 80 | 80 | 80 | 100 | 120 |
Cycles used (Gcycle) | 2,040 | 3,808 | 3,372 | 2,851 | 2,428 |
Maximum heap size (kb) | 793,148 | 793,148 | 689,692 | 689,692 | 793,148 |
User time (s) | 674 | 1,350 | 1,217 | 1,016 | 791 |
A quick word on these results. Most of opam
‘s calculations are done by dose
and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!
Conclusions
If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.
The different strategies are:
- Next-fit: generally good and fast, but has very bad worst cases with big heaps.
- First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
- Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.
Remember that whatever works best for you, it’s still better than having to malloc
and free
by hand. Happy allocating!
About OCamlPro
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users harness the OCaml state-of-the art programming language.
We design, create and implement custom ad-hoc software for our clients in state-of-the-art languages (OCaml, Rust…). We also have a long experience in developing and maintaining open-source tooling for OCaml, such as Opam and ocp-indent, and we contribute to the core-development of OCaml, notably with our work on the Flambda optimizer branch. Another area of expertise is that of Formal Methods, with tools such as our SMT Solver Alt-Ergo (check ourAlt-Ergo Users’ Club). We also provide vocational trainings in OCaml and Rust, and we can build courses on formal methods on-demand. Do not hesitate reach out by email:contact@ocamlpro.com.
以上所述就是小编给大家介绍的《An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Linux命令行与shell脚本编程大全 第3版
[美]布鲁姆,布雷斯纳汉 / 门佳、武海峰 / 人民邮电出版社 / 2016-8-1 / CNY 109.00
这是一本关于Linux命令行与shell脚本编程的全方位教程,主要包括四大部分:Linux命令行,shell脚本编程基础,高级shell脚本编程,如何创建实用的shell脚本。本书针对Linux系统的最新特性进行了全面更新,不仅涵盖了详尽的动手教程和现实世界中的实用信息,还提供了与所学内容相关的参考信息和背景资料。通过本书的学习,你将轻松写出自己的shell脚本。一起来看看 《Linux命令行与shell脚本编程大全 第3版》 这本书的介绍吧!