An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy

栏目: IT技术 · 发布时间: 4年前

内容简介:But as OCaml 4.10.0 has now hit the shelves, a very exciting feature isAt OCamlPro, some of the tools that we develop, such as the

An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.

But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog :

- #8809, #9292: Add a best-fit allocator for the major heap; still
experimental, it should be much better than current allocation
policies (first-fit and next-fit) for programs with large heaps,
reducing both GC cost and memory usage.
This new best-fit is not (yet) the default; set it explicitly with
OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want
to increase the `space_overhead` parameter of the GC (a percentage,
80 by default), for example OCAMLRUNPARAM="o=85", for optimal
speed.
(Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan,
Xavier Leroy, Leo White)

At OCamlPro, some of the tools that we develop, such as the Opam installer, theAlt-Ergo SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.

Minor heap and Major heap: the GC in a nutshell

Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a Generational Garbage Collector with two spaces:

  • The minor heap uses the Stop-and-copy principle. It is fast but has to stop the computation to perform a full iteration.
  • The major heap uses the Mark-and-sweep principle. It has the perk of being incremental and behaves better for long-lived data.

Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called fragmentation , and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:

  • Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!
  • Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.

Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.

Where and how to allocate: the 3 strategies

OCaml used to propose 2 free-list allocation strategies: next-fit , the default, and first-fit . Version 4.10 of OCaml introduces the new best-fit strategy. Let’s compare them:

Next-fit, the original and remaining champion

OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:

  1. Keep a (circular) list of every hole in memory ordered by increasing addresses;
  2. Have a pointer on an element of that list;
  3. When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
  4. Otherwise, try the next hole and so-on.

This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.

First-fit, the unsuccessful contender

To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):

  • Same idea as next-fit, but with an extra allocation table.
  • Put the pointer back at the beginning of the list for each allocation.
  • Use the allocation table to skip some parts of the list.

Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.

Best-fit: a new challenger enters!

This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.

  • On small data (up to 32 words), segregated free lists will allow allocation in (mostly) constant time.
  • On big data, a general best-fit allocator based on splay trees .

This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.

How will best-fit fare? Let’s find out!

Try it!

First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.

That being said, we’d advise you don’t use it in production code yet.

Why you should try it

Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback you give means the more the future GC will be tuned for your needs.

In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.

Choosing the allocator strategy

Now, there are two ways to control the GC behavior: through the code or through environment variables.

First method: Adding instructions in your code

This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:

let () =
Gc.(set
  { (get()) with
    allocation_policy = 2; (* Use the best-fit strategy *)
    space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *)
  })

You might also want to add verbose = 0x400; or verbose = 0x404; in order to get some GC debug information. See here for more details on how to use the GC module.

Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.

Second method: setting $OCAMLRUNPARAM

At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called Opam , maybe you’ve heard of it? Though most commands take a few seconds, some administrative-heavy commands can be a strain on our computer. In other words: those are perfect for a benchmark.

Here’s what we did to benchmark Opam:

$ opam update
$ opam switch create 4.10.0
$ opam install opam-devel # or build your own code
$ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404'
$ cd my/local/opam-repository
$ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick

If you want to compile and run your own benchmarks, here are a few details on OCAMLRUNPARAM :

  • b=1 means “print the backtrace in case of uncaught exception”
  • a=2 means “use best-fit” (default is 0 , first-fit is 1 )
  • o=100 means “do less work” (default is 80 , lower means more work)
  • v=0x404 means “have the gc be verbose” ( 0x400 is “print statistics at exit”, 0x4 is “print when changing heap size”)

See the manual for more details on OCAMLRUNPARAM

You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).

Our results on opam

Our contribution in this article is to benchmark opam with the different allocation strategies:

Strategy: Next-fit First-fit Best-fit
Overhead: 80 80 80 100 120
Cycles used (Gcycle) 2,040 3,808 3,372 2,851 2,428
Maximum heap size (kb) 793,148 793,148 689,692 689,692 793,148
User time (s) 674 1,350 1,217 1,016 791

A quick word on these results. Most of opam ‘s calculations are done by dose and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!

Conclusions

If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.

The different strategies are:

  • Next-fit: generally good and fast, but has very bad worst cases with big heaps.
  • First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
  • Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.

Remember that whatever works best for you, it’s still better than having to malloc and free by hand. Happy allocating!

About OCamlPro

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users harness the OCaml state-of-the art programming language.

We design, create and implement custom ad-hoc software for our clients in state-of-the-art languages (OCaml, Rust…). We also have a long experience in developing and maintaining open-source tooling for OCaml, such as Opam and ocp-indent, and we contribute to the core-development of OCaml, notably with our work on the Flambda optimizer branch. Another area of expertise is that of Formal Methods, with tools such as our SMT Solver Alt-Ergo (check ourAlt-Ergo Users’ Club). We also provide vocational trainings in OCaml and Rust, and we can build courses on formal methods on-demand. Do not hesitate reach out by email:contact@ocamlpro.com.


以上所述就是小编给大家介绍的《An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Webbots、Spiders和Screen Scrapers

Webbots、Spiders和Screen Scrapers

斯昆克 / 2013-5 / 69.00元

《Webbots、Spiders和Screen Scrapers:技术解析与应用实践(原书第2版)》共31章,分为4个部分:第一部分(1~7章),系统全面地介绍了与Webbots、Spiders、Screen Scrapers相关的各种概念和技术原理,是了解和使用它们必须掌握的基础知识;第二部分(8~16章),以案例的形式仔细地讲解了价格监控、图片抓取、搜索排名检测、信息聚合、FTP信息、阅读与发......一起来看看 《Webbots、Spiders和Screen Scrapers》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具