内容简介:When it comes to runtime performance, Rust is one of the fastest guns in the west. It isWait a sec, slow in comparison to what? For example, if you compare it with Go, their compiler is doing a lot less work in general. It lacks support for generics and ma
When it comes to runtime performance, Rust is one of the fastest guns in the west. It is on par with the likes of C and C++ and sometimes even surpasses them. Compile times, however? That's a different story.
Why Is Rust Compilation Slow?
Wait a sec, slow in comparison to what? For example, if you compare it with Go, their compiler is doing a lot less work in general. It lacks support for generics and macros. Also, the Go compiler was built from scratch as a monolithic tool consisting of both, the frontend and the backend (rather than relying on, say, LLVM to take over the backend part, which is the case for Rust or Swift). This has advantages (more options for tweaking the entire process, yay) and disadvantages (higher maintenance costs and less supported architectures).
Comparing across toolchains makes little sense here, and compile times are mostly fine for smaller projects, so if your project builds fast enough, your job here is done.
Choosing Runtime Over Compile-Time Performance
As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is runtime performance vs. compile-time performance, and the Rust team nearly always (if not always) chose runtime over compile-time.
Overall, there are a few features and design decisions that limit Rust compilation speed:
- Macros : Code generation with macros can be quite expensive.
- Type checking
- Monomorphization : this is the process of generating specialized versions of generic functions. E.g., a function that takes an
Into<String>
gets converted into one that takes aString
and one that takes a&str
. - LLVM : that's the default compiler backend for Rust, where a lot of the heavy-lifting (like code-optimizations) takes place. LLVM is notorious for being slow .
- Linking : Strictly speaking, this is not part of compiling but happens right after. It "connects" your Rust binary with the system libraries.
cargo
does not explicitly mark the linking step, so many people add it to the overall compilation time.
If you're interested in all the gory details, check out this blog post by Brian Anderson.
Upstream Work
Making the Rust compiler faster is an ongoing process, and many fearless people are working on it . Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements . On top of that, Rust tracks compile regressions on a website dedicated to performance
Work is also put into optimizing the LLVM backend . Rumor has it that there's still a lot of low-hanging fruit. :grapes:
Why Bother?
Overall, the Rust compiler is legitimately doing a great job. That said, above a certain project size, the compile times are... let's just say they could be better.
According to the Rust 2019 survey , improving compile times is #4 on the Rust wishlist:
But all hope is not lost! Below is a list of tips and tricks on how to make your Rust project compile faster today . They are roughly ordered by practicality, so start at the top and work your way down until you're happy.
Use cargo check
Instead Of cargo build
Most of the time, you don't even have to compile your project at all; you just want to know if you messed up somewhere. Whenever you can, skip compilation altogether. What you want instead is laser-fast code linting, type- and borrow-checking.
For that, cargo has a special treat for you: :sparkles: cargo check
:sparkles:. Consider the differences in the number of instructions between cargo check
on the left and cargo debug
in the middle. (Note that the scales are different.)
A sweet trick I use is to run it in the background with cargo watch
. This way, it will cargo check
whenever you change a file.
:star: Pro-tip : Use cargo watch -c
to clear the screen before every run.
Use Rust Analyzer Instead Of Rust Language Server
Another quick way to check if you set the codebase on fire is to use a "language server". That's basically a "linter as a service", that runs next to your editor.
For a long time, the default choice here was rls , but lately, folks moved over to rust-analyzer , because it's more feature-complete and way more snappy. It supports all major IDEs. Switching to that alone might save your day.
Remove Unused Dependencies
So let's say you tried all of the above and find that compilation is still slow. What now?
Dependencies sometimes become obsolete thanks to refactoring. From time to time it helps to check if all of them are still needed to save compile time.
If this is your own project (or a project you like to contribute to), do a quick check if you can toss anything with cargo-udeps :
cargo install cargo-udeps && cargo +nightly udeps
Update Remaining Dependencies
Next, update your dependencies, because they themselves could have tidied up their dependency tree lately.
Take a deep dive with cargo-outdated
or cargo tree
(built right into cargo itself) to find any outdated dependencies. On top of that, use cargo audit
to get notified about any vulnerabilities which need to be addressed, or deprecated crates which need a replacement.
Here's a nice workflow that I learned from /u/oherrala on Reddit :
- Run
cargo update
to update to the latest semver compatible version. - Run
cargo outdated -wR
to find newer, possibly incompatible dependencies. Update those and fix code as needed. - Find duplicate versions of a dependency and figure out where they come from:
cargo tree --duplicate
shows dependencies which come in multiple versions.
(Thanks to /u/dbdr for pointing this out .)
:star: Pro-tip : Step 3 is a great way to contribute back to the community! Clone the repository and execute steps 1 and 2. Finally, send a pull request to the maintainers.
Replace Heavy Dependencies
From time to time, it helps to shop around for more lightweight alternatives to popular crates.
Again, cargo tree
is your friend here to help you understand which of your dependencies are quite heavy : they require many other crates, causing excessive network I/O and slow down your build. Then search for lighter alternatives.
Here are a few examples:
- Using serde ? Check out miniserde and maybe even nanoserde .
- reqwests is quite heavy. Maybe try attohttpc , which is more lightweight.
-
tokio dragging you down? How about smol ?(Edit: This won't help much with build times. More info ) - Swap out clap with pico-args if you only need basic argument parsing.
Here's an example where switching crates reduced compile times from 2:22min to 26 seconds.
Use Cargo Workspaces
Cargo has that neat feature called workspaces , which allow you to split one big crate into multiple smaller ones. This code-splitting is great for avoiding repetitive compilation because only crates with changes have to be recompiled. Bigger projects like servo and vector are using workspaces heavily to slim down compile times.
Learn more about workspaces here .
Disable Unused Features Of Crate Dependencies
:warning: Fair warning : it seems that switching off features doesn't always improve compile time. (See tikv's experiences here .)
Check the feature flags of your dependencies. A lot of library maintainers take the effort to split their crate into separate features that can be toggled off on demand. Maybe you don't need all the default functionality from every crate?
For example, tokio
has a ton of features that you can disable if needed.
A quick way to list the features of a crate is cargo-feature-set .
Admittedly, features are not very discoverable at the moment because there is no standard way to document them, but we'll get there eventually.
Use A Ramdisk For Compilation
When starting to compile heavy projects, I noticed that I was throttled on I/O. The reason was that I kept my projects on a measly HDD. A more performant alternative would be SSDs, but they usually have limited write-cycles .
Ramdisks to the rescue! These are like "virtual harddisks" that live in system memory.
User moschroe_de shared the following snippet over on Reddit , which creates a ramdisk for your current Rust project (on Linux):
mkdir -p target && \ sudo mount -t tmpfs none ./target && \ cat /proc/mounts | grep "$(pwd)" | sudo tee -a /etc/fstab
On macOS, you could probably do something similar with this script . I haven't tried that myself, though.
Cache Dependencies With sccache
Another neat project is sccache by Mozilla, which caches compiled crates to avoid repeated compilation.
I had this running on my laptop for a while, but the benefit was rather negligible, to be honest. It works best if you work on a lot of independent projects that share dependencies (in the same version). A common use-case is shared build servers.
Cranelift – The Alternative Rust Compiler
Lately, I was excited to hear that the Rust project is using an alternative compiler that runs in parallel with rustc
for every CI build: Cranelift , also called CG_CLIF
.
Here is a comparison between rustc
and Cranelift for some popular crates (blue means better):
Somewhat unbelieving, I tried to compile vector with both compilers.
The results were astonishing:
- Rustc: 5m 45s
- Cranelift: 3m 13s
I could really feel the difference! What's cool about this is that it creates fully working executable binaries. They won't be optimized as much, but they are great for testing.
A more detailed write-up is on Jason Williams' page , and the project code is on Github .
Switch To A Faster Linker
The thing that nobody seems to target is linking time. For me, when using something with a big dependency tree like Amethyst, for example linking time on my fairly recent Ryzen 7 1700 is ~10s each time, even if I change only some minute detail only in my code. — /u/Almindor on Reddit
According to the official documentation , "LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. [..] When you link a large program on a multicore machine, you can expect that LLD runs more than twice as fast as the GNU gold linker. Your mileage may vary, though."
If you're on Linux you can switch to lld
like so :
[target.x86_64-unknown-linux-gnu] rustflags = [ "-C", "link-arg=-fuse-ld=lld", ]
A word of caution: lld
might not be working on all platforms yet. At least on macOS, Rust support seems to be broken at the moment, and the work on fixing it has stalled (see rust-lang/rust#39915 ).
Tweak Compiler Flags
Rust comes with a huge set of compiler flags . For special cases, it can help to tweak them for your project.
Profile Compile Times
If you like to dig deeper, Rust compilation can be profiled with cargo rustc -- -Zself-profile
. The resulting trace file can be visualized with a flamegraph or the Chromium profiler:
There's also a cargo -Z timings
feature that gives some information about how long each compilation step takes, and tracks concurrency information over time.
Another golden one is cargo-llvm-lines
, which shows the number of lines generated and objects copied in the LLVM backend:
$ cargo llvm-lines | head -20 Lines Copies Function name ----- ------ ------------- 30737 (100%) 1107 (100%) (TOTAL) 1395 (4.5%) 83 (7.5%) core::ptr::drop_in_place 760 (2.5%) 2 (0.2%) alloc::slice::merge_sort 734 (2.4%) 2 (0.2%) alloc::raw_vec::RawVec<T,A>::reserve_internal 666 (2.2%) 1 (0.1%) cargo_llvm_lines::count_lines 490 (1.6%) 1 (0.1%) <std::process::Command as cargo_llvm_lines::PipeTo>::pipe_to 476 (1.5%) 6 (0.5%) core::result::Result<T,E>::map 440 (1.4%) 1 (0.1%) cargo_llvm_lines::read_llvm_ir 422 (1.4%) 2 (0.2%) alloc::slice::merge 399 (1.3%) 4 (0.4%) alloc::vec::Vec<T>::extend_desugared 388 (1.3%) 2 (0.2%) alloc::slice::insert_head 366 (1.2%) 5 (0.5%) core::option::Option<T>::map 304 (1.0%) 6 (0.5%) alloc::alloc::box_free 296 (1.0%) 4 (0.4%) core::result::Result<T,E>::map_err 295 (1.0%) 1 (0.1%) cargo_llvm_lines::wrap_args 291 (0.9%) 1 (0.1%) core::char::methods::<impl char>::encode_utf8 286 (0.9%) 1 (0.1%) cargo_llvm_lines::run_cargo_rustc 284 (0.9%) 4 (0.4%) core::option::Option<T>::ok_or_else
Avoid Procedural Macro Crates
Procedural macros are the hot sauce of Rust development: they burn through CPU cycles so use with care (keyword: monomorphization).
If you heavily use procedural macros in your project (e.g., if you use serde), you can try to sidestep their impact on compile times with watt , a tool that offloads macro compilation to Webassembly.
From the docs:
By compiling macros ahead-of-time to Wasm, we save all downstream users of the macro from having to compile the macro logic or its dependencies themselves. Instead, what they compile is a small self-contained Wasm runtime (~3 seconds, shared by all macros) and a tiny proc macro shim for each macro crate to hand off Wasm bytecode into the Watt runtime (~0.3 seconds per proc-macro crate you depend on). This is much less than the 20+ seconds it can take to compile complex procedural macros and their dependencies.
Note that this crate is still experimental.
(Oh, and did I mention that both, watt
and cargo-llvm-lines
were built by David Tolnay , who is a frickin' steamroller of an engineer?)
Compile On A Beefy Machine
On portable devices, compiling can drain your battery and be slow. To avoid that, I'm using my machine at home, a 6-core AMD FX 6300 with 12GB RAM, as a build machine. I can use it in combination with Visual Studio Code Remote Development .
If you don't have a dedicated machine yourself, you can compile in the cloud instead.
Gitpod.io is superb for testing a cloud build as they provide you with a beefy machine (currently 16 core Intel Xeon 2.30GHz, 60GB RAM) for free during a limited period. Simply add https://gitpod.io/#
in front of any Github repository URL. Here is an example for one of my Hello Rust episodes.
When it comes to buying dedicated hardware, here are some tips. Generally, you should get a proper multicore CPU like an AMD Ryzen Threadripper plus at least 32 GB of RAM.
Drastic Measures: Overclock Your CPU? :fire:
:warning: Warning: You can damage your hardware if you don't know what you are doing. Proceed at your own risk.
Here's an idea for the desperate. Now I don't recommend that to everyone, but if you have a standalone desktop computer with a decent CPU, this might be a way to squeeze out the last bits of performance.
Even though the Rust compiler executes a lot of steps in parallel, single-threaded performance is still quite relevant.
As a somewhat drastic measure, you can try to overclock your CPU. Here's a tutorial for my processor . (I owe you some benchmarks from my machine.)
Download ALL The Crates
If you have a slow internet connection, a big part of the initial build process is fetching all those shiny crates from crates.io. To mitigate that, you can download all crates in advance to cache them locally. criner does just that:
git clone https://github.com/the-lean-crate/criner cd criner cargo run --release -- mine
The archive size is surprisingly reasonable, with roughly 50GB of required disk space.
Help Others: Upload Leaner Crates For Faster Build Times
cargo-diet
helps you build lean crates that significantly reduce download size (sometimes by 98%). It might not directly affect your own build time, but your users will surely be thankful. :blush:
What's Next?
Phew! That was a long list. If you have any additional tips, please let me know .
If compiler performance is something you're interested in, why not collaborate on a tool to see what user code is causing rustc to use lots of time?
Also, once you're done optimizing your build times, how about optimizing runtime next? My friend Pascal Hertleif has a nice article on that.
Credits
Thanks to Luca Pizzamiglio for reviewing drafts of this article.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
勇敢新世界‧互聯網罪與罰
許煜、劉細良 / CUP / 2005 / $48
我天天上網數小時,為的是要在節目裡面介紹世界的最新動態,尤其是網絡這個世界本身日新月異的變化。所以我不可能不注意到BT、共享軟件、 Wikipedia、網絡監管等各種影響政治、社會、經濟及文化的重要網絡現象。但是我發現市面上一直沒有一本內容充實全面,資料切時的中文參考書,直到這本《互聯網罪與罰》。而且,最大的驚喜是它易讀好看,簡直就像故事書。 梁文道 鳳凰衛視 《網羅天下......一起来看看 《勇敢新世界‧互聯網罪與罰》 这本书的介绍吧!