Cheap tricks for high-performance Rust

栏目: IT技术 · 发布时间: 4年前

内容简介:So you’re writing Rust but it’s not fast enough? Even though you’re usingPlease remember that the following suggestionsLet’s first of all enable some more optimizations for when we do

So you’re writing Rust but it’s not fast enough? Even though you’re using cargo build --release ? Here’s some small things you can do to increase the runtime speed of a Rust project – practically without changing any code!

Please remember that the following suggestions do not replace actual profiling and optimizations! I also think it goes without saying that the only way to detect if any of this helps is having benchmarks that represent how your application behaves under real usage.

Tweaking our release profile

Let’s first of all enable some more optimizations for when we do cargo build --release . The deal is pretty simple: We enable some features that make building release builds even slower but get more thorough optimizations as a reward.

We add the flags described below to our main Cargo.toml file, i.e., the top most manifest file in case you are using a Cargo workspace . If you don’t already have a section called profile.release , add it:

[profile.release]

Link-time optimization

The first thing we’ll do is enable link-time optimization (LTO). It’s a kind of whole-program or inter-module optimization as it runs as the very last step when linking the different parts of your binary together. You can think of it as allowing better inlining across dependency boundaries (but it’s of course more complicated that that).

Rust can use multiple linker flavors, and the one we want is “optimize across all crates”, which is called “fat”. To set this, add the lto flag to your profile:

lto = "fat"

Code generation units

Next up is a similar topic. To speed up compile times, Rust tries to split your crates into small chunks and compile as many in parallel as possible. The downside is that there’s less opportunities for the compiler to optimize code across these chunks. So, let’s tell it to do one chunk per crate:

codegen-units = 1

Setting a specific target CPU

By default, Rust wants to build a binary that works on as many machines of the target architecture as possible. However, you might actually have a pretty new CPU with cool new features! To enable those, we add

-C target-cpu=native

as a “Rust flag”, i.e. the environment variable RUSTFLAGS or the target’s rustflags field in your .cargo/config .

Aborting

Now we get into some of the more unsafe options. Remember how Rust by default uses stack unwinding (on the most common platforms)? That costs performance! Let’s skip stack traces and the ability to catch panics for reduced code size and better cache usage:

panic = "abort"

Please note that some libraries might depend on unwinding and will explode horribly if you enable this!

Using a different allocator

One thing many Rust programs do is allocate memory. And they don’t just do this themselves but actually use an (external) library for that: an allocator. Current Rust binaries use the default system allocator by default, previously they included their own with the standard library. (This change has lead to smaller binaries and better debug-abiliy which made some people quite happy).

Sometimes your system’s allocator is not the best pick, though. Not to worry, we can change it! I suggest giving both jemalloc and mimalloc a try.

jemalloc

jemalloc is the allocator that Rust previously shipped with and that the Rust compiler still uses itself. Its focus is to reduce memory fragmentation and support high concurrency. It’s also the default allocator on FreeBSD. If this sounds interesting to you, let’s give it a try!

First off, add the jemallocator crate as a dependency:

[dependencies]
jemallocator = "0.3.2"

Then in your applications entry point ( main.rs ), set it as the global allocator like this:

#[global_allocator]
static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc;

Please note that jemalloc doesn’t support all platforms.

mimalloc

Another interesting alternative allocator is mimalloc . It was developed by Microsoft, has quite a small footprint, and some innovative ideas for free lists.

It also features configurable security features (have a look at its Cargo.toml ). Which means we can turn them off more performance! Add the mimalloc crate as a dependency like this:

[dependencies]
mimalloc = { version = "0.1.17", default-features = false }

and, same as above, add this to your entry point file:

#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

Profile Guided Optimization

This is a neat feature of LLVM but I’ve never used it. Please read the docs .

Actual profiling and optimizing your code

Now this is where you need to actually adjust your code and fix all those clone() calls. Sadly, this is a topic for another post! (While you wait another year for me to write it, you can read aboutcows!)

Edit:People keep asking for those actual tips on how to optimize Rust code. And luckily

I tricked them

they had some good material for me to link to:


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Python编程实践

Python编程实践

Jennifer Campbell、Paul Gries、Jason Montojo、Greg Wilson / 唐学韬 / 机械工业出版社华章公司 / 2011-12-31 / 49.00元

Python是当今世界流行的编程语言之一。本书共15章,通过一些短小精悍的交互式Python脚本帮助学生进行练习,并在这个过程中掌握诸如数据结构、排序和搜索算法、面向对象编程、数据库访问、图形用户界面等基本概念以及良好的程序设计风格。本书既是一本注重科学的计算机科学专业教材,也是一本目标明确的Python参考书。 本书语言风格言简意赅,图表丰富,简单实用,是一本优秀的Python入门级读物,......一起来看看 《Python编程实践》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试