Benchmarking C++ Allocators

栏目: IT技术 · 发布时间: 5年前

内容简介:For C++ programs, replacing malloc and free at runtime is the worst choice. When the compiler can see the definition of new and delete at build time it can generate far better programs. When it can’t see them, it generates out-of-line function calls to mal

C++ allocator implementation can be crucial to C++ application performance. There are many blogs describing the benefits of using jemalloc  or tcmalloc  or hoard , rather than system allocators like ptmalloc  on GNU/Linux. All of these publications share the same flaws:’

  1. They use the dynamic linker to replace malloc and free, and
  2. They refer to the obsolete gperftools distribution of tcmalloc.

An example of both is the widely linked Percona blog post  comparing tcmalloc, jemalloc, and ptmalloc. It shows essentially that ptmalloc falls apart at high parallelism, and that jemalloc and tcmalloc are about the same. Benchmarking C++ Allocators

For C++ programs, replacing malloc and free at runtime is the worst choice. When the compiler can see the definition of new and delete at build time it can generate far better programs. When it can’t see them, it generates out-of-line function calls to malloc for every operator new, which is bananas.

Another thing to keep in mind is that the developers of tcmalloc never use it via dynamic preload. They only use it via bazel’s malloc option, which builds the program with the designated allocator. Consequently they don’t have any motivation to improve tcmalloc’s performance and behaviors as a malloc/free library. They are focused on using it as a build-time C++ allocator, and all their work on tcmalloc is guided by its performance in that role.

A more recent blog from IT Hare  still falls victim to both #1 and #2, but since their code is on Github we can fix it. By properly building their benchmark with modern tcmalloc, we can see how much C++ new/delete performance can be improved. Figures are milliseconds to complete the entire benchmark run. System is a 7th-generation Intel Core CPU with 8 threads on 4 cores.

Threads

jemalloc

gperftools

tcmalloc

1

629ms

546

358

2

637

1638

240

4

662

3745

401

8

1461

5216

565

By using tcmalloc with runtime dynamic loading, we leave a lot of potential performance on the table. The benchmark is dramatically faster when built with tcmalloc.


以上所述就是小编给大家介绍的《Benchmarking C++ Allocators》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据挖掘十大算法

数据挖掘十大算法

(美)吴信东(Xindong Wu)、(美),库玛尔 ,(Vipin Kumar) / 李文波、吴素研 / 清华大学出版社 / 2013-5 / 39.00元

《世界著名计算机教材精选:数据挖掘十大算法》详细介绍了在实际中用途最广、影响最大的十种数据挖掘算法,这十种算法是数据挖掘领域的顶级专家进行投票筛选的,覆盖了分类、聚类、统计学习、关联分析和链接分析等重要的数据挖掘研究和发展主题。《世界著名计算机教材精选:数据挖掘十大算法》对每一种算法都进行了多个角度的深入剖析,包括算法历史、算法过程、算法特性、软件实现、前沿发展等,此外,在每章最后还给出了丰富的习......一起来看看 《数据挖掘十大算法》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

在线进制转换器
在线进制转换器

各进制数互转换器

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具