Writing Python inside your Rust code

栏目: IT技术 · 发布时间: 4年前

内容简介：About a year ago, I published a Rust crate calledIf you’re not familiar with the inline-python crate, this is what it allows you to do:It allows you to embed Python code right between your lines of Rust code. It even allows you to use your Rust variables i

About a year ago, I published a Rust crate called inline-python , which allows you to easily mix some Python into your Rust code using a python!{ .. } macro. In this series, I’ll go through the process of developing this crate from scratch.

Sneak preview

If you’re not familiar with the inline-python crate, this is what it allows you to do:

fn main() {
    let who = "world";
    let n = 5;
    python! {
        for i in range('n):
            print(i, "Hello", 'who)
        print("Goodbye")
    }
}

It allows you to embed Python code right between your lines of Rust code. It even allows you to use your Rust variables inside the Python code.

We’ll start with a much simpler case, and slowly work our way up to this result (and more!).

Running Python code

First, let’s take a look at how we can Python code from Rust. Let’s try to make this first simple example work:

fn main() {
    println!("Hello ...");
    run_python("print(\"... World!\")");
}

We could implement run_python by using std::process::Command to run the python executable and pass it the Python code, but if we ever expect to be able to define and read back Python variables, we’re probably better off if we start by using the PyO3 library instead.

PyO3 gives us Rust bindings for Python. It nicely wraps the Python C API , letting us interact with all kind of Python objects directly from Rust. (And even make Python libraries in Rust, but that’s a whole other topic.)

Its Python::run function looks exactly like what we need. It takes the Python code as a &str , and allows us to define any variables in scope using two optional PyDict s. Let’s give it a try:

fn run_python(code: &str) {
    let py = pyo3::Python::acquire_gil(); // Acquire the 'global interpreter lock', as Python is not thread-safe.
    py.python().run(code, None, None).unwrap(); // No locals, no globals.
}

$ cargo run
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/scratchpad`
Hello ...
... World!

Success!

Rule based macros

Writing inside a string literal is not the most convenient way to write Python, so let’s see if we can improve that. Macros allow us to allow custom syntax within Rust, so let’s try to use one:

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
    }
}

Macros are normally defined using using macro_rules! , which lets you define a macro using advanced ‘find and replace’ rules based on things like tokens and expressions. (See the chapter on macros in the Rust Book for an introduction to macro_rules! . See The Little Book of Rust Macros for all the scary details.)

Macros defined by macro_rules! can not execute any code at compile time, they are only applying replacement rules based on patterns. Great for things like vec![] , and even lazy_static!{ .. } , but not powerful enough for things such as parsing and compiling regular expressions (e.g. regex!("a.*b") ).

In the matching rules of a macro, we can match on things like expressions, identifiers, types, and many other things. Since ‘valid Python code’ is not an option, we’ll just make our macro accept anything: raw tokens, as many as needed:

macro_rules! python {
    ($($code:tt)*) => {
        ...
    }
}

(See the resources linked above for details on how macro_rules! works.)

An invocation of our macro should result in run_python("..") , with all Python code wrapped in that string literal. We’re luckily: there’s a builtin macro that puts things in a string for us, called strinfigy! .

macro_rules! python {
    ($($code:tt)*) => {
        run_python(stringify!($($code)*));
    }
}

Let’s try!

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.32s
     Running `target/debug/scratchpad`
Hello ...
... World!

Success!

But wait, what happens if we have more than one line of Python code?

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/scratchpad`
Hello ...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: Py(0x7f1c0a5649a0, PhantomData) }', src/main.rs:9:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Oof, that’s unfortunate.

To debug this, let’s properly print the PyErr , and also show the exact Python code we’re feeding to Python::run :

fn run_python(code: &str) {
    println!("-----");
    println!("{}", code);
    println!("-----");
    let py = pyo3::Python::acquire_gil();
    if let Err(e) = py.python().run(code, None, None) {
        e.print(py.python());
    }
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/scratchpad`
Hello ...
-----
print("... World!") print("Bye.")
-----
  File "<string>", line 1
    print("... World!") print("Bye.")
                        ^
SyntaxError: invalid syntax

Apparently both lines of Python code ended up on the same line, and the Python rightfully complains about this being invalid syntax.

And now we’ve stumbled across the biggest problem we’ll have to overcome: stringify! messes up white-space.

White-space and tokens

Let’s take a closer look at what stringify! does:

fn main() {
    println!("{}", stringify!(
        a 123    b   c
        x ( y + z )
        // comment
        ...
    ));
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/scratchpad`
a 123 b c x(y + z) ...

Not only does it remove all unnecessary white-space, it even removes comments. The reason is that we’re working with tokens here, not the original source code: a , 123 , b , etc.

One of the first things rustc does, is to tokenize the source code. This makes it easier to do the rest of the parsing, not having to deal with individual characters like 1 , 2 , 3 , but only with tokens such as ‘integer literal 123’. Also, white-space and comments are gone after tokenizing, as they are meaningless for the compiler.

stringify!() is a way to convert a bunch of tokens back to a string, but on a ‘best effort’ basis: It will convert the tokens back to text, and only insert spaces around tokens when needed (to avoid turning b , c into bc ).

So this is a bit of a dead end. Rustc has carelessly thrown our precious white-space away, which is very significant in Python.

We could try to have some code guess which spaces have to be replaced back by newlines, but indentation is definitely going to be a problem:

fn main() {
    let a = stringify!(
        if True:
            x()
        y()
    );
    let b = stringify!(
        if True:
            x()
            y()
    );
    dbg!(a);
    dbg!(b);
    dbg!(a == b);
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/scratchpad`
[src/main.rs:12] a = "if True : x() y()"
[src/main.rs:13] b = "if True : x() y()"
[src/main.rs:14] a == b = true

The two snippets of Python code have a different meaning, but stringify! gives us the same result for both.

Before giving up, let’s try the other type of macros.

Procedural macros

Rust’s procedural macros , are another way way to define macros. Whereas macro_rules! can only define ‘function-style macros’ (those with an ! ), procedural macros can also define custom derive macros (e.g. #[derive(Stuff)] ) and attribute macros (e.g. #[stuff] ).

Procedural macros are implemented as a compiler plugin. You get to write a function that gets access to the token stream the compiler sees, can do whatever it wants, and then needs to return a new token stream which the compiler will use instead (or in addition, in the case of a custom derive):

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    todo!()
}

That TokenStream there doesn’t predict anything good. We need the original source code, not just the tokens. But let’s just continue anyway. Maybe a procedural macro gives us more flexibility to hack our way around any problems.

Because procedural macros run Rust code as part of the compilation process, they need to go in a separate proc-macro crate, which is compiled before you can compile anything that uses it.

$ cargo new --lib python-macro
     Created library `python-macro` package

In python-macro/Cargo.toml :

[lib]
proc-macro = true

In Cargo.toml :

[dependencies]
python-macro = { path = "./python-macro" }

Let’s start with an implementation that just panics ( todo!() ), after printing the TokenStream :

// python-macro/src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    dbg!(input.to_string());
    todo!()
}

// src/main.rs
use python_macro::python;

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
error[E0658]: procedural macros cannot be expanded to statements
 --> src/main.rs:5:5
  |
5 | /     python! {
6 | |         print("... World!")
7 | |         print("Bye.")
8 | |     }
  | |_____^
  |
  = note: see issue #54727 <https://github.com/rust-lang/rust/issues/54727> for more information
  = help: add `#![feature(proc_macro_hygiene)]` to the crate attributes to enable

Whelp, what happened here?

Rust complains that ‘ procedural macros cannot be expanded to statements ', and something about enabling ‘hygienic macros’. Macro hygiene is the wonderful feature of Rust macros to not accidentally ‘leak’ any names to the outside world (or the reverse). If a macro expands to code that uses some temporary variable named x , it will be separate from any x that appears in any code outside of the macro.

However, this feature isn’t stable yet for procedural macros. The result is that procedural macros are not (yet) allowed to appear in any place other than as a item by itself (e.g. at file scope, but not inside a function).

There exists a very ~~horrible~~ fascinating workaround for this, but let’s just enable the experimental #![feature(proc_macro_hygiene)] and continue our adventure.

(If you are reading this in the future, when proc_macro_hygiene has been stabilized: You could’ve skipped the last few paragraphs. ^^)

$ sed -i '1i#![feature(proc_macro_hygiene)]' src/main.rs
$ cargo r
   Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:6] input.to_string() = "print(\"... World!\") print(\"Bye.\")"
error: proc macro panicked
 --> src/main.rs:6:5
  |
6 | /     python! {
7 | |         print("... World!")
8 | |         print("Bye.")
9 | |     }
  | |_____^
  |
  = help: message: not yet implemented

error: aborting due to previous error

error: could not compile `scratchpad`.

Our procedural macro panics as expected, after showing us the input it got as string:

print("... World!") print("Bye.")

Again, as expected, with the white-space thrown away. :(

Time to give up.

Or.. Maybe there’s a way to work around this.

Reconstructing white-space

Although rustc only works with tokens while parsing en compiling, it somehow still knows exactly where to point when it has errors to report. There’s no newlines left in the tokens, but it still knows our error happened on lines 6 through 9. How?

It turns out that tokens contain quite a bit of information. They contain a Span , which is basically the start and end location of the token in the original source file. The Span can tell which file, line, and column number a token starts and ends at.

If we can get to this information, we can reconstruct the white-space by putting spaces and newlines between tokens to match their line and column information.

Functions that give us this information are not yet stable and gated behind #![feature(proc_macro_span)] . Let’s enable it, and see what we get:

#![feature(proc_macro_span)]

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    for t in input {
        dbg!(t.span().start());
    }
    todo!()
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 7,
    column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 7,
    column: 13,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 8,
    column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 8,
    column: 13,
}

Nice! We got some numbers.

But there’s only four tokens? It turns out ("... World!") appears one token here, and not three ( ( , "... World!" , and ) ). If we look at the documentation of TokenStream , we can see it doesn’t give us a stream of tokens, but of token trees . Apparently Rust’s tokenizer already matches parentheses (and braces and brackets) and doesn’t just give a linear list of tokens, but a tree of tokens. Tokens inside parentheses will be children of a single Group token .

Let’s modify our procedural macro to recursively go over all the tokens inside groups as well (and improve the output a bit):

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    print(input);
    todo!()
}

fn print(input: TokenStream) {
    for t in input {
        if let TokenTree::Group(g) = t {
            println!("{:?}: open {:?}", g.span_open().start(), g.delimiter());
            print(g.stream());
            println!("{:?}: close {:?}", g.span_close().start(), g.delimiter());
        } else {
            println!("{:?}: {}", t.span().start(), t.to_string());
        }
    }
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
LineColumn { line: 7, column: 8 }: print
LineColumn { line: 7, column: 13 }: open Parenthesis
LineColumn { line: 7, column: 14 }: "... World!"
LineColumn { line: 7, column: 26 }: close Parenthesis
LineColumn { line: 8, column: 8 }: print
LineColumn { line: 8, column: 13 }: open Parenthesis
LineColumn { line: 8, column: 14 }: "Bye."
LineColumn { line: 8, column: 20 }: close Parenthesis

Wonderful!

Now to reconstruct the white-space, we need to insert newlines if we’re not on the right line yet, and spaces if we’re not in the right column yet. Let’s see:

 #![feature(proc_macro_span)]

extern crate proc_macro;
use proc_macro::{TokenTree, TokenStream, LineColumn};

 #[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    let mut s = Source {
        source: String::new(),
        line: 1,
        col: 0,
    };
    s.reconstruct_from(input);
    println!("{}", s.source);
    todo!()
}

struct Source {
    source: String,
    line: usize,
    col: usize,
}

impl Source {
    fn reconstruct_from(&mut self, input: TokenStream) {
        for t in input {
            if let TokenTree::Group(g) = t {
                let s = g.to_string();
                self.add_whitespace(g.span_open().start());
                self.add_str(&s[..1]); // the '[', '{' or '('.
                self.reconstruct_from(g.stream());
                self.add_whitespace(g.span_close().start());
                self.add_str(&s[s.len() - 1..]); // the ']', '}' or ')'.
            } else {
                self.add_whitespace(t.span().start());
                self.add_str(&t.to_string());
            }
        }
    }

    fn add_str(&mut self, s: &str) {
        // Let's assume for now s contains no newlines.
        self.source += s;
        self.col += s.len();
    }

    fn add_whitespace(&mut self, loc: LineColumn) {
        while self.line < loc.line {
            self.source.push('\n');
            self.line += 1;
            self.col = 0;
        }
        while self.col < loc.column {
            self.source.push(' ');
            self.col += 1;
        }
    }
}

Fingers crossed..

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0






        print("... World!")
        print("Bye.")
error: proc macro panicked

Okay, that works, but what’s with all the extra newlines and spaces? Oh right, the first token starts at line 7 column 8, so it correctly puts print on line 7 in column 8. The location we’re looking at is the exact location in the .rs file.

The extra newlines at the start are not a problem (empty lines have no effect in Python). It even has a nice side effect: When Python reports an error, the line number it reports will match the line number in the .rs file.

However, the 8 spaces are a problem. Although the Python code inside our python!{..} is properly indented with respect to our Rust code, the Python code we extract should start at a ‘zero’ indentation level. Otherwise Python will complain about invalid indentation.

Let’s subtract the column number of the first token from all column numbers:

    start_col: None,
    // <snip>
    start_col: Option<usize>,
    // <snip>
    let start_col = *self.start_col.get_or_insert(loc.column);
    while self.col < loc.column - start_col {
        self.source.push(' ');
        self.col += 1;
    }
    // <snip>

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0






print("... World!")
print("Bye.")
error: proc macro panicked

Awesome!

Now we only have to turn this string into a string literal token and put run_python(); around it:

    TokenStream::from_iter(vec![
        TokenTree::from(Ident::new("run_python", Span::call_site())),
        TokenTree::Group(Group::new(
            Delimiter::Parenthesis,
            TokenStream::from(TokenTree::from(Literal::string(&s.source))),
        )),
        TokenTree::from(Punct::new(';', Spacing::Alone)),
    ])

Ugh, working with token trees is horrible. Especially making trees and streams from scratch.

If only there was a way to just write the Rust code we want to produce and— Ah yes, the quote! macro from the quote crate:

    let source = s.source;
    quote!( run_python(#source); ).into()

Okay, that’s better.

Now to test it using our original run_python function:

#![feature(proc_macro_hygiene)]
use python_macro::python;

fn run_python(code: &str) {
    let py = pyo3::Python::acquire_gil();
    if let Err(e) = py.python().run(code, None, None) {
        e.print(py.python());
    }
}

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling scratchpad v0.1.0 (/tmp/2020-04-17-15-56-14/scratchpad)
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/scratchpad`
Hello ...
... World!
Bye.

Success!

:tada:

Turning this into a library

Now to turn this into a reusable library, we:

Remove fn main ,
Rename main.rs to lib.rs ,
Give the crate a good name, like inline-python ,
Make run_python public,
Change the run_python call in the quote!() to ::inline_python::run_python , and
Add pub python_macro::python; to re-export the python! macro from this crate.

What’s next

We can now run snippets of Python between Rust code. There’s probably tons of things to improve and plenty of bugs to discover,

The biggest problem for now is that this isn’t very useful yet, since no data can (easily) cross the Rust-Python border.

In the next post, we’ll take a look at how we can make Rust variables available to the Python code.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Writing Python inside your Rust code

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数据结构（C语言版）

严蔚敏、吴伟民 / 清华大学出版社 / 2012-5 / 29.00元

《数据结构》（C语言版）是为“数据结构”课程编写的教材，也可作为学习数据结构及其算法的C程序设计的参数教材。本书的前半部分从抽象数据类型的角度讨论各种基本类型的数据结构及其应用；后半部分主要讨论查找和排序的各种实现方法及其综合分析比较。其内容和章节编排1992年4月出版的《数据结构》（第二版）基本一致，但在本书中更突出了抽象数据类型的概念。全书采用类C语言作为数据结构和算法的描述语言。 ......一起来看看《数据结构（C语言版）》这本书的介绍吧!

码农工具

RGB转16进制工具

RGB HEX 互转工具

HSV CMYK 转换工具

HSV CMYK互换工具