内容简介:For C++ programs, replacing malloc and free at runtime is the worst choice. When the compiler can see the definition of new and delete at build time it can generate far better programs. When it can’t see them, it generates out-of-line function calls to mal
C++ allocator implementation can be crucial to C++ application performance. There are many blogs describing the benefits of using jemalloc or tcmalloc or hoard , rather than system allocators like ptmalloc on GNU/Linux. All of these publications share the same flaws:’
- They use the dynamic linker to replace malloc and free, and
- They refer to the obsolete gperftools distribution of tcmalloc.
An example of both is the widely linked Percona blog post comparing tcmalloc, jemalloc, and ptmalloc. It shows essentially that ptmalloc falls apart at high parallelism, and that jemalloc and tcmalloc are about the same.
For C++ programs, replacing malloc and free at runtime is the worst choice. When the compiler can see the definition of new and delete at build time it can generate far better programs. When it can’t see them, it generates out-of-line function calls to malloc for every operator new, which is bananas.
Another thing to keep in mind is that the developers of tcmalloc never use it via dynamic preload. They only use it via bazel’s malloc option, which builds the program with the designated allocator. Consequently they don’t have any motivation to improve tcmalloc’s performance and behaviors as a malloc/free library. They are focused on using it as a build-time C++ allocator, and all their work on tcmalloc is guided by its performance in that role.
A more recent blog from IT Hare still falls victim to both #1 and #2, but since their code is on Github we can fix it. By properly building their benchmark with modern tcmalloc, we can see how much C++ new/delete performance can be improved. Figures are milliseconds to complete the entire benchmark run. System is a 7th-generation Intel Core CPU with 8 threads on 4 cores.
Threads |
jemalloc |
gperftools |
tcmalloc |
1 |
629ms |
546 |
358 |
2 |
637 |
1638 |
240 |
4 |
662 |
3745 |
401 |
8 |
1461 |
5216 |
565 |
By using tcmalloc with runtime dynamic loading, we leave a lot of potential performance on the table. The benchmark is dramatically faster when built with tcmalloc.
以上所述就是小编给大家介绍的《Benchmarking C++ Allocators》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
设计模式
[美] Erich Gamma、Richard Helm、Ralph Johnson、John Vlissides / 李英军、马晓星、蔡敏、刘建中 等 / 机械工业出版社 / 2000-9 / 35.00元
这本书结合设计实作例从面向对象的设计中精选出23个设计模式,总结了面向对象设计中最有价值的经验,并且用简洁可复用的形式表达出来。书中分类描述了一组设计良好、表达清楚的软件设计模式,这些模式在实用环境下特别有用。此书适合大学计算机专业的学生、研究生及相关人员参考。 书中涉及的设计模式并不描述新的或未经证实的设计,只收录了那些在不同系统中多次使用过的成功设计。一起来看看 《设计模式》 这本书的介绍吧!