Disk space and LTO improvements

栏目: IT技术 · 发布时间: 4年前

内容简介:Thanks to the work ofThese changes have been added incrementally over the past three months, with the latest changes landing just a few days ago on the nightly channel. The bulk of the improvements will be found in the 1.46 stable release (available on 202

Thanks to the work of Nicholas Nethercote and Alex Crichton , there have been some recent improvements that reduce the size of compiled libraries, and improves the compile-time performance, particularly when using LTO. This post dives into some of the details of what changed, and an estimation of the benefits.

These changes have been added incrementally over the past three months, with the latest changes landing just a few days ago on the nightly channel. The bulk of the improvements will be found in the 1.46 stable release (available on 2020-08-27). It would be great for any projects that use LTO to test it out on the nightly channel (starting from the 2020-06-13 release) and report any issues that arise.

Background

When compiling a library, rustc saves the output in an rlib file which is an archive file . This has historically contained the following:

  • Object code, which is the result of code generation. This is used during regular linking.
  • LLVM bitcode , which is a binary representation of LLVM's intermediate representation. This can be used for Link Time Optimization (LTO).
  • Rust-specific metadata, which covers a wide range of data about the crate.

LTO is an optimization technique that can perform whole-program analysis. It analyzes all of the bitcode from every library at once, and performs optimizations and code generation. rustc supports several forms of LTO:

  • Fat LTO. This performs "full" LTO, which can take a long time to complete and may require a significant amount of memory.
  • Thin LTO . This LTO variant supports much better parallelism than fat LTO. It can achieve similar performance improvements as fat LTO (sometimes even better!), while taking much less total time by taking advantage of more CPUs.
  • Thin-local LTO. By default, rustc will split a crate into multiple "codegen units" so that they can be processed in parallel by LLVM. But this prevents some optimizations as code is separated into different codegen units, and is handled independently. Thin-local LTO will perform thin LTO across the codegen units within a single crate, bringing back some optimizations that would otherwise be lost by the separation. This is rustc 's default behavior if opt-level is greater than 0.

What has changed

Changes have been made to both rustc and Cargo to control which libraries should include object code and which should include bitcode based on the project'sprofile LTO settings. If the project is not using LTO, then Cargo will instruct rustc to not place bitcode in the rlib files, which should reduce the amount of disk space used. This may have a small improvement in performance since rustc no longer needs to compress and write out the bitcode.

If the project is using LTO, then Cargo will instruct rustc to not place object code in the rlib files, avoiding the expensive code generation step. This should improve the build time when building from scratch, and reduce the amount of disk space used.

Two rustc flags are now available to control how the rlib is constructed:

  • -C linker-plugin-lto causes rustc to only place bitcode in the .o files, and skips code generation. This flag was originally added to support cross-language LTO. Cargo now uses this when the rlib is only intended for use with LTO.
  • -C embed-bitcode=no causes rustc to avoid placing bitcode in the rlib altogether. Cargo uses this when LTO is not being used, which reduces some disk space usage.

Additionally, the method in which bitcode is embedded in the rlib has changed. Previously, rustc would place compressed bitcode as a .bc.z file in the rlib archive. Now, the bitcode is placed as an uncompressed section within each .o object file in the rlib archive. This can sometimes be a small performance benefit, because it avoids cost of compressing the bitcode, and sometimes can be slower due to needing to write more data to disk. This change helped simplify the implementation, and also matches the behavior of clang's -fembed-bitcode option (typically used with Apple's iOS-based operating systems).

Improvements

The following is a summary of improvements observed on a small number of real-world projects of small and medium size. Improvements of a project will depend heavily on the code, optimization settings, operating system, environment, and hardware. These were recorded with the 2020-06-21 nightly release on Linux with parallel job settings between 2 and 32.

The performance wins for debug builds were anywhere from 0% to 4.7% faster. Larger binary crates tended to fare better than smaller library crates.

LTO builds were recorded anywhere from 4% to 20% faster. Thin LTO fared consistently better than fat LTO.

The number of parallel jobs also had a large impact on the amount of improvement. Lower parallel job counts saw substantially more benefit than higher ones. A project built with -j2 can be 20% faster, whereas the same project at -j32 would only be 1% faster. Presumably this is because the code-generation phase benefits from higher concurrency, so it was taking a relatively smaller total percentage of time.

The overall target directory size is typically 20% to 30% smaller for debug builds. LTO builds did not see as much of an improvement, ranging from 11% to 19% smaller.

More details

Nicholas Nethercote wrote about the journey to implement these changes at https://blog.mozilla.org/nnethercote/2020/04/24/how-to-speed-up-the-rust-compiler-in-2020/ . It took several PRs across rustc and Cargo to make this happen:

Conclusion

Although this is a conceptually simple change (LTO=bitcode, non-LTO=object code), it took quite a bit of preparation and work to make it happen. There were many edge cases and platform-specific behaviors to consider, and testing to perform. And, of course, the obligatory bike-shedding over the names of new command-line flags. This resulted in quite a substantial improvement in performance, particularly for LTO builds, and a huge improvement in disk space usage. Thanks to all of those that helped to make this happen!


以上所述就是小编给大家介绍的《Disk space and LTO improvements》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

可视化未来

可视化未来

[美] 埃雷兹·艾登、[法] 让-巴蒂斯特·米歇尔 / 王彤彤、沈华伟、程学旗 / 浙江人民出版社 / 2015-9 / 54.90元

科学的传播速度有多快?今时今日我们很少谈论上帝了吗?人们什么时候开始用“having sex” 而不用“making love”? 史上的人是在哪岁成名的?语法的变化速度到底有多快?哪些作家被纳粹审查得最彻底? “donut” 什么时候开始取代“doughnut”? 我 们能否预测人类未来?比尔·克林顿和花椰菜哪个更出名? 《可视化未来》一书的两位作者通过与“谷歌图书”的合作,得以有机会研究......一起来看看 《可视化未来》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

随机密码生成器
随机密码生成器

多种字符组合密码

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具