内容简介:LLVM is not good at big functions, I have seen this before as well. One of the functions you are generating is almost 100,000 lines of Rust code including tons of internal control flow: starting
iceiix #1
I have a very large macro instantiation, which takes 4 or 5 hours to compile with Rust 1.40, so I'm looking to improve this unreasonably long build time.
The macro wasn't always this slow to build, it seems to have gotten slower with newer versions of Rust, although I haven't pinpointed the exact point it regressed, it was always still slower than I would have liked (I recall it was around 30 minutes in earlier versions, still uncomfortably long for new contributors). I have avoided modifying this file as long as I can to avoid the long recompilation on my system, but I plan to expand this macro in the near future, so it is time to finally tackle this compilation problem.
What can I do to reduce the multi-hour compile times for this macro?
Here is the file with the macro in question, for reference:
The big macro is define_blocks
.
Here's what I have tried so far:
1. Reduce the size of the macro instantiation
If I remove all but two of the "blocks" (definitions which the macro matches against), then a release build completes in less than a second. So the compile time is definitely correlated with the size of what I pass to the macro. I haven't bothered to gather more data points (number of blocks x time to build) and graph the relationship (exponential?), but it is clearly related.
This is the first thing I tried, confirming the macro is the cause of the slowdown. But I actually need to pass all those definitions to the macro, because that's what it is for.
I suppose I could rewrite this code by hand to not use macros. This could be one potential fix for the problem but I have a lot of functionality invested into this macro and would like to keep for now it if possible. Which brings me to:
2. Enable macro expansion tracing
Using the nightlytrace_macros feature, with trace_macros!(true)
before define_blocks!()
.
Observing the trace output, the macro expansion completes in about 9 seconds. Not hours, so the problem must be elsewhere.
3. Profile the compiler
Building with the -Z time-passes
rustc compiler flag described in Guide to Rustc Development
. Using the same optimization level as in the default release profile:
blocks $ cargo clean ; time cargo rustc -- -C 'opt-level=3' -Z time-passes
Completed in 5 hours 9 minutes. The end of the output shows what is taking so long:
time: 0.002 codegen passes [4gqa868n29dtlfbb] time: 0.652 LTO passes time: 0.075 codegen passes [473etbgnzq2x8af8] time: 3.476 LTO passes time: 4.218 LTO passes time: 4.255 codegen passes [22fyvf1uiha0mrri] time: 4.234 codegen passes [6ffwyc85r8r5jgh] time: 63.064 LTO passes time: 15574.417 codegen passes [2w3r8ppd44eh9kz5] time: 15707.329 LLVM passes time: 0.003 serialize work products time: 0.746 linking time: 15771.815 total Finished dev [unoptimized + debuginfo] target(s) in 263m 19s real 309m45.295s user 198m44.124s sys 14m54.290s
15574.417 seconds spent in "codegen passes" from LLVM. This sounds optimization-related.
4. Compare optimization levels
No optimization, finishes in less than 2 minutes:
cargo clean ; time cargo rustc -- -C 'opt-level=0' -Z time-passes Finished dev [unoptimized + debuginfo] target(s) in 1m 41s
Basic optimization, only 2 minutes:
cargo clean ; time cargo rustc -- -C 'opt-level=1' -Z time-passes Finished dev [unoptimized + debuginfo] target(s) in 2m 05s
"Some" optimizations (opt-level=2) ran for 3.7+ hours and consumed 20 GB of memory before I killed the process, so I estimate it takes on the same order of magnitude as full optimizations (opt-level=3) as in the release build (5.15 hours). Whatever is causing the slowdown occurs with the optimization passes enabled at optimization level 2+.
5. Count lines of LLVM IR generated
Found this handy cargo llvm-lines
tool here: Improve compile time and executable size by counting lines of LLVM IR
When I run it on my project (in an debug build so it doesn't take forever), the top functions are:
Compiling steven_blocks v0.0.1 (steven/blocks) Finished dev [unoptimized + debuginfo] target(s) in 1m 19s 151828 2 core::ops::function::FnOnce::call_once 43954 488 core::option::Option<T>::map 24252 896 core::ptr::real_drop_in_place 19669 1 steven_blocks::Block::get_model 15834 39 alloc::raw_vec::RawVec<T,A>::reserve_internal 14674 1 steven_blocks::Block::get_flat_offset 12548 1 steven_blocks::Block::get_hierarchical_data 12362 1 steven_blocks::Block::get_collision_boxes 11195 1 steven_blocks::Block::get_model_variant 9678 1 <steven_blocks::Block as core::fmt::Debug>::fmt
The getters in Block are all macro-generated code that matches on *self
to insert arbitrary expressions specific to the "block", for example this is get_model:
#[allow(unused_variables)] pub fn get_model(&self) -> (String, String) { match *self { $( Block::$name { $($fname,)? } => { let parts = $model; (String::from(parts.0), String::from(parts.1)) } )+ } }
19669 is a lot of lines of IR, but I don't think this is incorrect. There really are that many expressions (currently 300, one for each block definition), each of varying complexity.
A possible angle of attack could be to try to rewrite the code with the goal of reducing the number of IR lines generated. Open to suggestions of how to do this. One idea is to replace the huge match
generated code with a data structure, such as a HashMap. Mapping the keys should be straightforward (the block identifiers), but the values are arbitrary expressions dependent on other variables defined in the block definition. Which is why this was all generated by a macro in the first place. Would like to preserve this macro as-is if I can.
6. Reduce optimization level in sub-crate manifest
As the crate compiles 153x faster with basic optimizations, I thought I would settle for reducing the optimization level from the default full optimizations for the release profile, by adding this to blocks/Cargo.toml
:
[profile.release] opt-level = 1
This worked... when I tested the crate by itself. However, the crate is nested in another project, and it turns out the manifest profile configuration is actually ignored except at the top-level, as documented inThe Cargo Book:
Cargo supports custom configuration of how rustc is invoked through profiles at the top level. Any manifest may declare a profile, but only the top level package’s profiles are actually read. All dependencies’ profiles will be overridden. This is done so the top-level package has control over how its dependencies are compiled.
If I reduce the optimization level at the top-level package, then building is fast as expected, but I want to optimize everything except this blocks-macro package as much as possible (in fact my program won't run properly without full optimizations, it is too slow). Only the crate with the gigantic define_blocks!
macro builds slowly, so that's all I need to reduce the optimization level for, but it fails due to this rule.
7. Reduce optimization level in sub-crate config profile
Can dependencies have their own separate opt-level? Not yet, butconfig profiles should allow this in nightly, and as I understand it, profile-overrides will be in Rust 1.41.
First attempt, I placed this file in blocks/.cargo/config:
[profile.release] opt-level = 1
and executed with cargo +nightly build --release -Z config-profile
. But using the profile-overrides feature should work come 1.41 and it does work well in nightly, by adding this to the top-level Cargo.toml:
cargo-features = ["profile-overrides"] ... [profile.release.package.steven_blocks] opt-level = 1
Finally, I have a decent workaround. I'll have to switch from stable to nightly, for the time being, and sacrifice the optimizations for this crate, but at least I can develop and iterate quickly.
Yet I am still left wondering, what exactly in LLVM optimization could be this slow? Is there anyway I can dig deeper into the 4 hour "codegen passes" phase? Is it possible to identify and disable or speedup the problematic optimization?
7 Likes
dtolnay #2
LLVM is not good at big functions, I have seen this before as well. One of the functions you are generating is almost 100,000 lines of Rust code including tons of internal control flow: starting here , repeated 300× in the definition of VANILLA_ID_MAP. You should reorganize define_blocks to break this one function up into 300 individual functions and it will compile in minutes.
15 Likes
iceiix #3
Perfect, thank you! Was hoping there was a straightforward refactoring possible to address this, and indeed breaking up the gigantic function covering all blocks, into multiple functions (one per block), addresses the issue.
In case anyone else runs across a similar problem, to summarize, the pathologically slow macro implementation was something like this:
lazy_static! { static ref VANILLA_ID_MAP: VanillaIDMap = { $({ /* large amount of code for $name */ })+ };
This creates one huge function containing repeated code for each block, as seen with trace_macros ( {...} {...}
... ad infinitum). So I moved each block registration into their own function, in a new submodule, then call them all in lazy_static:
mod block_registration_functions { use super::*; $( pub fn $name(...) { /* large amount of code for $name */ } )+ } lazy_static! { static ref VANILLA_ID_MAP: VanillaIDMap = { $( block_registration_functions::$name(...) )+ };
(The complete code change: https://github.com/iceiix/stevenarella/pull/267/files#diff-ded23d9a49a226cef06ee7fe5e00ebb2 )
This simple change allows the crate to build not in 5 hours, but 40 seconds! (450x as fast)
10 Likes
以上所述就是小编给大家介绍的《How refactoring a macro reduced my Rust project build time from 4 hours to 40sec》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
国际游戏设计全教程
[美]迈克尔·萨蒙德 / 张然、赵嫣 / 中国青年出版社 / 2017-2 / 108.00元
你想成为一名电子游戏设计师吗?想知道《肯塔基0号路》《到家》《枪口》等独立游戏的制作理念及过程吗?想了解《戈莫布偶大冒险》《辐射3》《战争机器》中关卡设计的奥秘吗?本书用通俗易懂的文字介绍了在游戏开发与策划过程中,需要掌握的游戏设计原理和制作的基础知识,可以作为读者从“构思一个电子游戏”到“真正完成一个电子游戏”的完备指南。 本书以系统的游戏设计流程结合大量优秀的游戏设计案例进行讲解,让读者......一起来看看 《国际游戏设计全教程》 这本书的介绍吧!