内容简介:So you’re writing Rust but it’s not fast enough? Even though you’re usingPlease remember that the following suggestionsLet’s first of all enable some more optimizations for when we do
So you’re writing Rust but it’s not fast enough?
Even though you’re using cargo build --release
?
Here’s some small things you can do to increase the runtime speed of a Rust project
– practically without changing any code!
Please remember that the following suggestions do not replace actual profiling and optimizations! I also think it goes without saying that the only way to detect if any of this helps is having benchmarks that represent how your application behaves under real usage.
Tweaking our release
profile
Let’s first of all enable some more optimizations
for when we do cargo build --release
.
The deal is pretty simple:
We enable some features that make building release builds even slower
but get more thorough optimizations as a reward.
We add the flags described below to our main Cargo.toml
file,
i.e., the top most manifest file in case you are using a Cargo workspace
.
If you don’t already have a section called profile.release
, add it:
[profile.release]
Link-time optimization
The first thing we’ll do is enable link-time optimization (LTO). It’s a kind of whole-program or inter-module optimization as it runs as the very last step when linking the different parts of your binary together. You can think of it as allowing better inlining across dependency boundaries (but it’s of course more complicated that that).
Rust can use multiple linker flavors,
and the one we want is “optimize across all crates”, which is called “fat”.
To set this, add the
lto
flag to your profile:
lto = "fat"
Code generation units
Next up is a similar topic. To speed up compile times, Rust tries to split your crates into small chunks and compile as many in parallel as possible. The downside is that there’s less opportunities for the compiler to optimize code across these chunks. So, let’s tell it to do one chunk per crate:
codegen-units = 1
Setting a specific target CPU
By default, Rust wants to build a binary that works on as many machines of the target architecture as possible. However, you might actually have a pretty new CPU with cool new features! To enable those, we add
-C target-cpu=native
as a “Rust flag”,
i.e. the environment variable RUSTFLAGS
or the target’s rustflags
field in your
.cargo/config
.
Aborting
Now we get into some of the more unsafe options. Remember how Rust by default uses stack unwinding (on the most common platforms)? That costs performance! Let’s skip stack traces and the ability to catch panics for reduced code size and better cache usage:
panic = "abort"
Please note that some libraries might depend on unwinding and will explode horribly if you enable this!
Using a different allocator
One thing many Rust programs do is allocate memory. And they don’t just do this themselves but actually use an (external) library for that: an allocator. Current Rust binaries use the default system allocator by default, previously they included their own with the standard library. (This change has lead to smaller binaries and better debug-abiliy which made some people quite happy).
Sometimes your system’s allocator is not the best pick, though. Not to worry, we can change it! I suggest giving both jemalloc and mimalloc a try.
jemalloc
jemalloc is the allocator that Rust previously shipped with and that the Rust compiler still uses itself. Its focus is to reduce memory fragmentation and support high concurrency. It’s also the default allocator on FreeBSD. If this sounds interesting to you, let’s give it a try!
First off, add the
jemallocator
crate as a dependency:
[dependencies] jemallocator = "0.3.2"
Then in your applications entry point ( main.rs
),
set it as the global allocator like this:
#[global_allocator] static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc;
Please note that jemalloc doesn’t support all platforms.
mimalloc
Another interesting alternative allocator is mimalloc . It was developed by Microsoft, has quite a small footprint, and some innovative ideas for free lists.
It also features configurable security features
(have a look at
its Cargo.toml
).
Which means we can turn them off more performance!
Add the
mimalloc
crate
as a dependency like this:
[dependencies] mimalloc = { version = "0.1.17", default-features = false }
and, same as above, add this to your entry point file:
#[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
Profile Guided Optimization
This is a neat feature of LLVM but I’ve never used it. Please read the docs .
Actual profiling and optimizing your code
Now this is where you need to actually adjust your code
and fix all those clone()
calls.
Sadly, this is a topic for another post!
(While you wait another year for me to write it, you can read aboutcows!)
Edit:People keep asking for those actual tips on how to optimize Rust code. And luckily
they had some good material for me to link to:
-
The very convenient
cargo flamegraph
(also works as a standalone tool) - Christopher Sebastian recently published How To Write Fast Rust Code
- Robin Freyler’s Fastware Workshop from RustFest 2018
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
20个月赚130亿
陈士骏、张黎明 / 中国华侨出版社 / 2011-11-17 / 35.00元
YouTube联合创始人陈士骏在书中以朴实亲切的口吻讲述了他的人生经历,以及对学业、事业、梦想、财富、生死等的种种感悟。 童年随全家去美国小镇定居,少年时代迷上计算机编程; 离大学毕业还有几个月时放弃学位,怀揣200美元奔赴硅谷,加入创业公司PayPal,公司上市后成为百万富翁; 因为无法接受PayPal被EbayeBay收购后工程师丧失发言权,和好友一起开创视频网站YouTub......一起来看看 《20个月赚130亿》 这本书的介绍吧!