内容简介:Tags:programming,This post is a quick walkthrough of how I wrote a Python library,The code is, of course,
Aug 2, 2020
Tags:programming, devblog , python , rust
This post is a quick walkthrough of how I wrote a Python library, procmaps , in nothing but Rust. It uses PyO3 for the bindings and maturin to manage the build (as well as produce manylinux1
-compatible wheels).
The code is, of course, available on GitHub , and can be installed directly with a modern Python (3.5+) via pip
without a local Rust install :
$ pip3 install procmaps
Procmaps?
procmaps is an extremely small Python library, backed by a similarly small Rust library .
All it does is parse “maps” files, best known for their presence under procfs
on Linux, into a list of Map
objects. Each Map
, in turn, contains the basic attributes of the mapped memory region.
By their Python attributes:
import os import procmaps # also: from_path, from_str # N.B.: named map_ instead of map to avoid shadowing the map function map_ = procmaps.from_pid(os.getpid())[0] map_.begin_address # the begin address for the mapped region map_.end_address # the end address for the mapped region map_.is_readable # is the mapped region readable? map_.is_writable # is the mapped region writable? map_.is_executable # is the mapped region executable? map_.is_shared # is the mapped region shared with other processes? map_.is_private # is the mapped region private (i.e., copy-on-write)? map_.offset # the offset into the region's source that the region originates from map_.device # a tuple of (major, minor) for the device that the region's source is on map_.inode # the inode of the source for the region map_.pathname # the "pathname" field for the region, or None if an anonymous map
Critically: apart from the import
s and the os.getpid()
call, all of the code above calls directly into compiled Rust .
Motivation
The motivations behind procmaps are twofold.
First: I do program analysis and instrumentation research at my day job. Time and time again, I need to obtain information about the memory layout of a program that I’m instrumenting (or would like to instrument). This almost always means opening /proc/<pid>/maps
, writing an ad-hoc parser, getting the field(s) I want, and then getting on with my life.
Doing this over and over again has made me realize that it’s an ideal task for a small, self-contained Rust library:
- The “maps” format is line-oriented and practically frozen, with no ambiguities. Rust has many high quality PEG and parser combinator libraries that are well suited to the task.
- Writing ad-hoc parsers for it is bad™, especially when those parsers are written in C and/or C++.
- Having a small library with a small API surface would make exposure to other languages (including C and C++) trivial.
Second: I started learning Rust about a year ago, and have been looking for new challenges in it. Interoperating with another language (especially one with radically different memory semantics, like Python) is an obvious choice.
Structure
The procmaps module is a plain old Rust crate. Really.
The only differences are in the Cargo.toml:
[lib] crate-type = ["cdylib"] [package.metadata.maturin] classifier = [ "Programming Language :: Rust", "Operating System :: POSIX :: Linux", ]
(Other settings under package.metadata.maturin
are available for e.g. managing Python-side dependencies, but procmaps doesn’t need them. More details are available here .)
In terms of code, the crate is structured like a normal Rust library. PyO3 only requires a few pieces of sugar to promote everything into Python-land:
Modules
Python modules are created by decorating a Rust function with #[pymodule]
.
This function then uses the functions of the PyModule
argument that it takes to load the module’s functions and classes.
For example, here is the Python-visible procmaps
module in its entirety :
#[pymodule] fn procmaps(_py: Python, m: &PyModule) -> PyResult<()> { m.add_class::<Map>()?; m.add_wrapped(wrap_pyfunction!(from_pid))?; m.add_wrapped(wrap_pyfunction!(from_path))?; m.add_wrapped(wrap_pyfunction!(from_str))?; Ok(()) }
Functions
Module level functions are trivial to create: they’re just normal Rust functions, marked with #[pyfunction]
. They’re loaded into modules via add_wrapped
+ wrap_pyfunction!
, as seen above. Alternatively, they can be created within a module definition (i.e., nested within the #[pymodule]
) function via the #[pyfn]
decorator.
Python-visible functions return a PyResult<T>
, where T
implements IntoPy<PyObject>
. PyO3 helpfully provides an implementation of this trait for many core types; a full table is here . This includes Option<T>
, making it painless to turn Rust-level functions that return Option
s into Python-level functions that can return None
.
procmaps doesn’t make use of them, but PyO3 also supports variadic arguments and keyword arguments. Details on those are available here .
Here’s a trivial Python-exposed function that does integer division, returning None
if division by zero is requested:
#[pyfunction] fn idiv(dividend: i64, divisor: i64) -> PyResult<Option<i64>> { if divisor == 0 { Ok(None) } else { Ok(Some(dividend / divisor)) } }
Classes
Classes are loaded into modules via the add_class
function, as seen in the module definition.
Just like modules, they’re managed almost entirely behind a single decorator, this time on a Rust struct. Here is the entirety of the procmaps.Map
class definition:
#[pyclass] struct Map { inner: rsprocmaps::Map, }
procmaps doesn’t need them, but trivial getters and setters can be added to the members of a class with #[pyo3(get, set)]
. For example, the following creates a Point
class:
#[pyclass] struct Point { #[pyo3(get, set)] x: i64, #[pyo3(get, set)] y: i64, }
…for which the following would be possible in Python:
# get_unit_point not shown above from pointlib import get_unit_point p = get_unit_point() print(p.x, p.y) p.x = 100 p.y = -p.x print(p.x, p.y)
Using #[pyclass]
on Foo
auto-implements IntoPy<PyObject> for Foo
, making it easy to return your custom classes from any function (as above) or member method (as below).
Member methods
Just as Python-visible classes are defined via #[pyclass]
on Rust struct
s, Python-visible member methods are declared via #[pymethods]
attribute on Rust impl
s for those structures.
Member methods return PyResult<T>
, just like functions do:
#[pymethods] impl Point { fn invert(&self) -> PyResult<Point> { Ok(Point { x: self.y, y: self.x}) } }
…allows for the following:
# get_unit_point not shown above from pointlib import get_unit_point p = get_unit_point() p_inv = p.invert()
By default, PyO3 forbids the creation of Rust-defined classes within Python code. To allow their creation, just add a function with the #[new]
attribute to the #[pymethods]
impl
block. This creates a __new__
Python method rather than __init__
; PyO3 doesn’t support the latter.
For example, here’s a constructor for the contrived Point
class above:
#[pymethods] impl Point { #[new] fn new(x: i64, y: i64) -> Self { Point { x, y } } }
…which allows for:
from pointlib import Point p = Point(100, 0) p_inv = p.invert() assert p.y == 100
Exceptions and error propagation
As mentioned above, (most) Python-visible functions and methods return PyResult<T>
.
The Err
half of PyResult
is PyErr
, and these values get propagated as Python exceptions. The pyo3::exceptions
module contains structures that parallel the standard Python exceptions, each of which provides a py_err(String)
function to produce an appropriate PyErr
.
Creating a brand new Python-level exception takes a single line with the create_exception!
macro. Here’s how procmaps creates a procmaps.ParseError
exception that inherits from the standard Python Exception
class:
use pyo3::exceptions::Exception; // N.B.: The first argument is the module name, // i.e. the function declared with #[pymodule]. create_exception!(procmaps, ParseError, Exception);
Similarly, marshalling Rust Error
types into PyErr
s is as simple as impl std::convert::From<ErrorType> for PyErr
.
Here’s how procmaps turns some of its errors into standard Python IOError
s and others into the custom procmaps.ParseError
exception:
// N.B.: The newtype here is only necessary because Error comes from an // external crate (rsprocmaps). struct ProcmapsError(Error); impl std::convert::From<ProcmapsError> for PyErr { fn from(err: ProcmapsError) -> PyErr { match err.0 { Error::Io(e) => IOError::py_err(e.to_string()), Error::ParseError(e) => ParseError::py_err(e.to_string()), Error::WidthError(e) => ParseError::py_err(e.to_string()), } } }
Compilation and distribution
With everything above, cargo build
just works — it produces a Python-loadable shared object.
Unfortunately, it does it using the cdylib
naming convention, meaning that cargo build
for procmaps produces libprocmaps.so
, rather than one of the naming conventions that Python knows how to look for when searching $PYTHONPATH
.
This is where maturin comes in: once installed, a single maturin build
in the crate root puts an appropriately named pip
-compatible wheel in target/wheels
.
It gets even better: maturin develop
will install the compiled module directly into the current virtual environment, making local development as simple as:
$ python3 -m venv env $ source env/bin/activate (env) $ pip3 install maturin (env) $ maturin develop $ python3 > import procmaps
procmaps has a handy Makefile that wraps all of that; running the compiled module locally is a single make develop
away.
Distribution is slightly more involved: maturin develop
builds wheels that are compatible with the local machine, but further restrictions on symbol versions and linkages are required to ensure that a binary wheel runs on a large variety of Linux versions and distributions.
Compliance with these constraints is normally enforced in one of two ways:
- Packages are compiled into binary wheels, and then audited (and potentially repaired) via the PyPA’s auditwheel before release.
- Packages are compiled into binary wheels within a wholly controlled runtime environment, such as the PyPa’s manylinux Docker containers.
Distribution with maturin
takes the latter approach: the maturin
developers have derived a Rust build container from the PyPa’s standard manylinux
container, making fully compatible builds (again, from the crate root) as simple as:
# optional: do `build --release` for release-optimized builds $ docker run --rm -v $(pwd):/io konstin2/maturin build
This command, like a normal maturin build
, drops the compiled wheel(s) into target/wheels
. Because it runs inside of the standard manylinux
container, it can and does automatically build wheels for a wide variety of Python versions (Python 3.5 through 3.8, as of writing).
From here, distribution to PyPI is as simple as twine upload target/wheels/*
or maturin publish
. procmaps currently uses the former, as releases are handled via GitHub Actions using the PyPA’s excellent gh-action-pypi-publish
action.
Voilá: a Python module, written completely in Rust, that can be installed on the vast majority of Linux distributions with absolutely no dependencies on Rust itself. Even the non- maturin
metadata in Cargo.toml
is propagated correctly!
Wrapup
I only ran into one small hiccup while working on procmaps — I tried to add a Map.__contains__
method to allow for inclusion checks with the in
protocol, e.g.:
fn __contains__(&self, addr: u64) -> PyResult<bool> { Ok(addr >= self.inner.address_range.begin && addr < self.inner.address_range.end) }
…but this didn’t work, for whatever reason, despite working when called manually:
>>> 4194304 in map_ Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: argument of type 'Map' is not iterable >>> map_.__contains__(4194304) True
There’s probably a reasonable explanation for this in the Python data model that I haven’t figured out.
By and large, the process of writing a Python module in Rust was extremely pleasant — I didn’t have to write a single line of Python (or even Python-specific configuration) until I wanted to add unit tests. Both pyO3 and maturin are incredibly polished, and the PyPA’s efforts to provide manylinux
build environments made compatible builds a breeze.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。