Fixing Rust's test suite on RISC-V

栏目: IT技术 · 发布时间: 4年前

内容简介:Myprevious blog post introduced my work to improveIn the last blog post I mentioned an issue where some

Fixing Rust's test suite on RISC-V

Myprevious blog post introduced my work to improve Rust's support for RISC-V Linux systems. Since then I fixed a couple of interesting compiler bugs. This blog post is more technical - describing these bugs and explaining some rustc internals along the way. I conclude by reporting on movement in the broader Rust community regarding RISC-V. I assumed that the reader is comfortable with programming terminology. This blog post contains Rust code samples but the reader is not expected to be fluent in Rust.

Hanging Tests

In the last blog post I mentioned an issue where some rustc tests would hang indefinitely. While I was tracking down the problem, the upstream Rust project upgraded from LLVM 9 to LLVM 10 , which fixed the hanging tests. I did not look into this issue further.

What is LLVM anyway?

Modern compilers do not translate directly from source code into machine code. Source code is intended to be convenient for humans to read; but it often is not convenient for a compiler to reason about. Instead, compilers transform source code through one (or more) intermediate representations on its way to becoming machine code. rustc compiles Rust source through intermediate representations into LLVM Intermediate Representation(LLVM IR). LLVM then runs optimisation passes on the LLVM IR and finally generates machine code for the chosen architecture.

I think of LLVM IR as an elaborated machine-independenttyped assembly language with unlimited registers. See this LLVM IR for a function to add integers:

add.rs

pub fn add(x: i32, y: i32) -> i32 {
    x + y
}

rustc --crate-type lib -O add.rs --emit llvm-ir

; ModuleID = 'add.3a1fbbbh-cgu.0'
source_filename = "add.3a1fbbbh-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; add::add
; Function Attrs: norecurse nounwind nonlazybind readnone uwtable
define i32 @_ZN3add3add17h7cc3d194e9d7e4cdE(i32 %x, i32 %y) unnamed_addr #0 {
start:
  %0 = add i32 %y, %x
  ret i32 %0
}

attributes #0 = { norecurse nounwind nonlazybind readnone uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}

In LLVM IR, lines starting with ; are comments. target datalayout and target triple tell LLVM what architecture it should generate code for.

Next is the function definition. In Rust, the function name is add::add ; but the compiler mangles the name to _ZN3add3add17h7cc3d194e9d7e4cdE . Then define i32 @foo(i32 %x, i32 %y) means "define a function called foo which takes two 32-bit signed integer ( i32 ) arguments and returns an i32 " . Inside the function, a start: label begins the block; then %0 = add i32 %y, %x performs signed 32-bit addition on x and y and puts the result in a variable automatically named 0 . Finally, ret i32 %0 returns the value of 0 as an i32 .

The attributes #0 line provides extra annotations for all functions tagged with #0 (such as add::add ). The annotations give hints for LLVM optimisation passes.

In theory, to support a new target architecture rustc only needs to know how to tell LLVM that it should generate code for that architecture. rustc can then generate LLVM IR as usual and LLVM will handle everything specific to the architecture. For example, see the tiny PR which initially added the RISC-V Linux target.

In practice, rustc also needs to know how to conform to the ABI for the target so that the generated code is interoperable with other programming languages. The RISC-V ABI was added separately.

* As well as LLVM, rustc also has an experimental cranelift backend .

** LLVM IR can be ABI specific but is not machine specific.

Code generation Test Failure

Other than fixing the hanging ui tests, LLVM 10 also broke code generation tests for RISC-V. In LLVM 9 IR, function arguments weren't always named but in LLVM 10 they are . This change broke rustc code generation tests which look for specific strings in the LLVM IR output by rustc .

After narrowing this test failure down to the LLVM 10 upgrade, I found the relevant change by looking at clang (which also uses LLVM) tests for the RISC-V ABI. The fix was as simple as copying the clang test changes and adapting them for rustc 's tests.

UI Tests

The ui tests for rustc check all user facing aspects of the compiler. Some of these tests check that rustc displays the correct error messages when compiling erroneous source.

There was a bug highlighted by some of these tests on RISC-V: rustc displays the correct errors but in the wrong order! To debug this I used -Z treat-err-as-bug=n to cause rustc to panic on the n th error. The panic backtrace shows where the error is generated from. In this case, the miss-ordered errors came from src/librustc_resolve/late.rs :

/// Checks that all of the arms in an or-pattern have exactly the
/// same set of bindings, with the same binding modes for each.
fn check_consistent_bindings(&mut self, pats: &[P<Pat>]) -> Vec<BindingMap> {
    let mut missing_vars = FxHashMap::default();
    let mut inconsistent_vars = FxHashMap::default();

    // 1) Compute the binding maps of all arms.
[...]

    // 2) Record any missing bindings or binding mode inconsistencies.
[...]

    // 3) Report all missing variables we found.
    let mut missing_vars = missing_vars.iter_mut().collect::<Vec<_>>();
    missing_vars.sort();

    for (name, mut v) in missing_vars {
        if inconsistent_vars.contains_key(name) {
            v.could_be_path = false;
        }
        self.r.report_error(
            *v.origin.iter().next().unwrap(),
            ResolutionError::VariableNotBoundInPattern(v),
        );
    }

[...]
}

In part 3, missing_vars is sorted before the errors are reported so how can the errors be out of order?

At the time of the sort, missing_vars is a Vec tor of tuples, with each tuple containing a Symbol and a mutable reference to a BindingError . In Rust this type is written as Vec<(Symbol, &mut BindingError)> . Rust tuples sort first by the left-most element (in this case, the Symbol ). To explain Symbol ordering following a sort, first we must look at how strings are used within rustc .

String Interning

Source code often contains duplicates of the same string token. For example, in the above listing of check_consistent_bindings the string "FxHashMap" occurs twice and "missing_vars" occurs five times. Allocating separate strings for each of these occurrences would waste memory and time. Instead, rustc allocates each string once and uses indices to refer to the full string each time it is needed. These indices are 32-bit unsigned integers (in a wrapper type) and so can be copied and compared efficiently. The process of allocating a string only once and using references to that string is called "interning".

Symbol is the type representing an interned string:

#[derive(Clone, Copy, PartialEq, ParitialOrd, Hash)]
pub struct Symbol(SymbolIndex);

Here we can see that the implementation of PartialOrd for Symbol (which provides the Ord ering of Vec::sort above) derives from the index's Ord ering. But where does the index come from?

// The `&'static str`s in this type actually point into the arena.
#[derive(Default)]
pub struct Interner {
    arena: DroplessArena,
    names: FxHashMap<&'static str, Symbol>,
    strings: Vec<&'static str>,
}

impl Interner {
    #[inline]
    pub fn intern(&mut self, string: &str) -> Symbol {
        if let Some(&name) = self.names.get(string) {
            return name;
        }

        let name = Symbol::new(self.strings.len() as u32);

        // `from_utf8_unchecked` is safe since we just allocated a `&str` which is known to be
        // UTF-8.
        let string: &str =
            unsafe { str::from_utf8_unchecked(self.arena.alloc_slice(string.as_bytes())) };
        // It is safe to extend the arena allocation to `'static` because we only access
        // these while the arena is still alive.
        let string: &'static str = unsafe { &*(string as *const str) };
        self.strings.push(string);
        self.names.insert(string, name);
        name
    }
}

impl Symbol {
    const fn new(n: u32) -> Self {
        Symbol(SymbolIndex::from_u32(n))
    }

    /// Maps a string to its interned representation.
    pub fn intern(string: &str) -> Self {
        with_interner(|interner| interner.intern(string))
    }

    /// Access the symbol's chars. This is a slowish operation because it
    /// requires locking the symbol interner.
    pub fn with<F: FnOnce(&str) -> R, R>(self, f: F) -> R {
        with_interner(|interner| f(interner.get(self)))
    }
}

rustc interns strings using Interner::intern , which returns a Symbol . References to the interned strings are stored in a Vec tor and the Symbol contains the index of the string reference in that Vec tor. Therefore, the Symbol can look up its string by indexing into the strings vector and following the reference (pointer) to the actual string (which is allocated in the arena ).

For example, if the strings "a" , "b" and "c" are interned, then (ignoring the referencing), the Interner 's Vec will look like ["a", "b", "c"] . If instead we interned "b" , "a" , "b" and "c" ; the Vec will look like ["b", "a", "c"] because the second attempt to intern "b" finds that "b" is already interned and so just returns the index of the existing copy of "b" . Carrying on with this second example, imagine the following:

let mischief = Symbol::intern("b");
let a = Symbol::intern("a");
let b = Symbol::intern("b");
let c = Symbol::intern("c");

let mut v = vec![c, b, a];
v.sort();

While we might expect v to end up equal to [a, b, c] , v is actually sorted to equal [b, a, c] because Symbol s are sorted by their indices not by the strings they point to. b received the lowest index because it was interned first. This semantic can be useful because the index ordering is faster than looking up the full strings and comparing them; and it might be useful to know what order the strings occurred in the input.

Back to out of order errors

You may now be wondering what this has to do with the errors being reported out of order. Sure Symbol s are sorted in a funny order but why would that be different depending upon the processor architecture?

To find out, I added code to print out the order in which strings are interned. It turned out that architecture specific extensions (such as AVX on X86) are interned along with other keywords before the source code is processed. RISC-V architecture extensions are named with single letters: see this from src/librustc_codegen_llvm/llvm_util.rs

const RISCV_WHITELIST: &[(&str, Option<Symbol>)] = &[
    ("m", Some(sym::riscv_target_feature)),
    ("a", Some(sym::riscv_target_feature)),
    ("c", Some(sym::riscv_target_feature)),
    ("f", Some(sym::riscv_target_feature)),
    ("d", Some(sym::riscv_target_feature)),
    ("e", Some(sym::riscv_target_feature)),
];

Here m is the extension for multiply and divide, a provides atomic instructions, c compressed instructions, etc.

The tests with badly ordered errors use the variable names a , b and c in that order. So most architectures order symbols as Symbol("a") < Symbol("b") < Symbol("c") (preserving the order in the source code), but RISC-V instead orders symbols as Symbol("a") < Symbol("c") < Symbol("b") because "a" and "c" correspond to names of RISC-V extensions; and architecture extensions are interned before source code is processed.

The fix involved modifying the error sorting to use the strings pointed to by the Symbol instead of the Symbol itself.

Other Rust RISC-V developments

With both of my PRs to fix tests merged, the full Rust test suite passes for a RISC-V cross-compilation toolchain (running tests under QEMU system emulation). I wrote another PR adding a docker image to make it easy to set up a RISC-V emulator and run these tests.

More excitingly, there was a policy-change regarding the support tiers (discussed in my last blog post). RISC-V Linux is now Tier 2 along with many others . However, in later discussion it turned out that RISC-V Linux is only Tier 2 for cross compilation. Native RISC-V binaries are not published by the Rust Project, but there is a PR to add RISC-V as a host platform. The compiler team made it clear that they require a team to maintain RISC-V host support so I organised one . So far most of the compiler team have agreed to RISC-V host support. I hope to see official RISC-V host binaries from the Rust Project some time soon!

Want to learn more?

Improving RISC-V Linux support in Rust

Build Engineering at Codethink

Get in touch


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

x86汇编语言

x86汇编语言

李忠、王晓波、余洁 / 电子工业出版社 / 2013-1 / 56.00元

《x86汇编语言:从实模式到保护模式》采用开源的NASM汇编语言编译器和VirtualBox虚拟机软件,以个人计算机广泛采用的Intel处理器为基础,详细讲解了Intel处理器的指令系统和工作模式,以大量的代码演示了16/32/64位软件的开发方法,介绍了处理器的16位实模式和32位保护模式,以及基本的指令系统。 《x86汇编语言:从实模式到保护模式》是一本有趣的书,它没有把篇幅花在计算一......一起来看看 《x86汇编语言》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

在线进制转换器
在线进制转换器

各进制数互转换器

MD5 加密
MD5 加密

MD5 加密工具