Reducing memory consumption in librsvg, part 3: slack space in Bézier paths

栏目: IT技术 · 发布时间: 5年前

内容简介:We got aAThe

We got a bug with a gigantic SVG of a map extracted from OpenStreetMap, and it has about 600,000 elements. Most of them are <path> , that is, specifications for Bézier paths.

A <path> can look like this:

<path d="m 2239.05,1890.28 5.3,-1.81"/>

The d attribute contains a list of commands to create a Bézier path, very similar to PostScript's operators. Librsvg has the following to represent those commands:

pub enum PathCommand {
    MoveTo(f64, f64),
    LineTo(f64, f64),
    CurveTo(CubicBezierCurve),
    Arc(EllipticalArc),
    ClosePath,
}

Those commands get stored in an array, a Vec inside a PathBuilder :

pub struct PathBuilder {
    path_commands: Vec<PathCommand>,
}

Librsvg translates each of the commands inside a <path d="..."/> into a PathCommand and pushes it into the Vec in the PathBuilder . When it is done parsing the attribute, the PathBuilder remains as the final version of the path.

To let a Vec grow efficiently as items are pushed into it, Rust makes the Vec grow by powers of 2. When we add an item, if the capacity of the Vec is full, its buffer gets realloc() ed to twice its capacity. That way there are only O(log₂n) calls to realloc() , where n is the total number of items in the array.

However, this means that once we are done adding items to the Vec , there may still be some free space in it: the capacity exceeds the length of the array . The invariant is that vec.capacity() >= vec.len() .

First I wanted to shrink the PathBuilder s so that they have no extra capacity in the end.

First step: convert to Box<[T]>

A "boxed slice" is a contiguous array in the heap, that cannot grow or shrink. That is, it has no extra capacity, only a length.

Vec has a method into_boxed_slice which does eactly that: it consumes the vector and converts it into a boxed slice without extra capacity. In its innards, it does a realloc() on the Vec 's buffer to match its length.

Let's see the numbers that Massif reports:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 23 22,751,613,855    1,560,916,408    1,493,746,540    67,169,868            0
                                       ^^^^^^^^^^^^^
                                           before

 30 22,796,106,012    1,553,581,072    1,329,943,324   223,637,748            0
                                       ^^^^^^^^^^^^^
                                           after

That is, we went from using 1,493,746,540 bytes on the heap to using 1,329,943,324 bytes. Simply removing extra capacity from the path commands saves about 159 MB for this particular file.

Second step: make the allocator do less work

However, the extra-heap column in that table has a number I don't like: there are 223,637,748 bytes in malloc() metadata and unused space in the heap.

I suppose that so many calls to realloc() make the heap a bit fragmented.

It would be good to be able to read most of the <path d="..."/> to temporary buffers that don't need so many calls to realloc() , and that in the end get copied to exact-sized buffers, without extra capacity.

We can do just that with the smallvec crate. A SmallVec has the same API as Vec , but it can store small arrays directly in the stack, without an extra heap allocation. Once the capacity is full, the stack buffer "spills" into a heap buffer automatically.

Most of the d attributes in the huge file in thebug have fewer than 32 commands. That is, if we use the following:

pub struct PathBuilder {
    path_commands: SmallVec<[PathCommand; 32]>,
}

We are saying that there can be up to 32 items in the SmallVec without causing a heap allocation; once that is exceeded, it will work like a normal Vec .

At the end we still do into_boxed_slice to turn it into an independent heap allocation with an exact size.

This reduces the extra-heap quite a bit:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 33 24,139,598,653    1,416,831,176    1,329,943,212    86,887,964            0
                                                        ^^^^^^^^^^

Also, the total bytes shrink from 1,553,581,072 to 1,416,831,176 — we have a smaller heap because there is not so much work for the allocator, and there are a lot fewer temporary blocks when parsing the d attributes.

Making the code prettier

I put in the following:

/// This one is mutable
pub struct PathBuilder {
    path_commands: SmallVec<[PathCommand; 32]>,
}

/// This one is immutable
pub struct Path {
    path_commands: Box<[PathCommand]>,
}

impl PathBuilder {
    /// Consumes the PathBuilder and converts it into an immutable Path
    pub fn into_path(self) -> Path {
        Path {
            path_commands: self.path_commands.into_boxed_slice(),
        }
    }
}

With that, PathBuilder is just a temporary struct that turns into an immutable Path once we are done feeding it. Path contains a boxed slice of the exact size, without any extra capacity.

Next steps

All the coordinates in librsvg are stored as f64 , double-precision floating point numbers. The SVG/CSS spec says that single-precision floats are enough, and that 64-bit floats should be used only for geometric transformations.

I'm a bit scared to make that change; I'll have to look closely at the results of the test suite to see if rendered files change very much. I suppose even big maps require only as much precision as f32 — after all, that is what OpenStreetMap uses.

References


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

创业者

创业者

[美] 杰西卡·利文斯顿 / 夏吉敏 / 机械工业出版社 / 2010-5 / 58.00元

本书源自作者对32个IT行业创业者的访谈,包括Apple, Gmail, hotmail, TiVo, Flickr, Lotus 及 Yahoo公司等著名公司,主题为创业初期的人和事。 对于做梦都想创业的人来说,这本书一定要读,可以看看前辈高人是如何创业的。 对于希望了解企业家成功历程和经验的人来说,这是一本必读之书,因为里面有很多成功人士年轻时的故事,你可以好象看着他们长大一样,知......一起来看看 《创业者》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具