Reducing memory consumption in librsvg, part 3: slack space in Bézier paths

栏目: IT技术 · 发布时间: 4年前

内容简介:We got aAThe

We got a bug with a gigantic SVG of a map extracted from OpenStreetMap, and it has about 600,000 elements. Most of them are <path> , that is, specifications for Bézier paths.

A <path> can look like this:

<path d="m 2239.05,1890.28 5.3,-1.81"/>

The d attribute contains a list of commands to create a Bézier path, very similar to PostScript's operators. Librsvg has the following to represent those commands:

pub enum PathCommand {
    MoveTo(f64, f64),
    LineTo(f64, f64),
    CurveTo(CubicBezierCurve),
    Arc(EllipticalArc),
    ClosePath,
}

Those commands get stored in an array, a Vec inside a PathBuilder :

pub struct PathBuilder {
    path_commands: Vec<PathCommand>,
}

Librsvg translates each of the commands inside a <path d="..."/> into a PathCommand and pushes it into the Vec in the PathBuilder . When it is done parsing the attribute, the PathBuilder remains as the final version of the path.

To let a Vec grow efficiently as items are pushed into it, Rust makes the Vec grow by powers of 2. When we add an item, if the capacity of the Vec is full, its buffer gets realloc() ed to twice its capacity. That way there are only O(log₂n) calls to realloc() , where n is the total number of items in the array.

However, this means that once we are done adding items to the Vec , there may still be some free space in it: the capacity exceeds the length of the array . The invariant is that vec.capacity() >= vec.len() .

First I wanted to shrink the PathBuilder s so that they have no extra capacity in the end.

First step: convert to Box<[T]>

A "boxed slice" is a contiguous array in the heap, that cannot grow or shrink. That is, it has no extra capacity, only a length.

Vec has a method into_boxed_slice which does eactly that: it consumes the vector and converts it into a boxed slice without extra capacity. In its innards, it does a realloc() on the Vec 's buffer to match its length.

Let's see the numbers that Massif reports:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 23 22,751,613,855    1,560,916,408    1,493,746,540    67,169,868            0
                                       ^^^^^^^^^^^^^
                                           before

 30 22,796,106,012    1,553,581,072    1,329,943,324   223,637,748            0
                                       ^^^^^^^^^^^^^
                                           after

That is, we went from using 1,493,746,540 bytes on the heap to using 1,329,943,324 bytes. Simply removing extra capacity from the path commands saves about 159 MB for this particular file.

Second step: make the allocator do less work

However, the extra-heap column in that table has a number I don't like: there are 223,637,748 bytes in malloc() metadata and unused space in the heap.

I suppose that so many calls to realloc() make the heap a bit fragmented.

It would be good to be able to read most of the <path d="..."/> to temporary buffers that don't need so many calls to realloc() , and that in the end get copied to exact-sized buffers, without extra capacity.

We can do just that with the smallvec crate. A SmallVec has the same API as Vec , but it can store small arrays directly in the stack, without an extra heap allocation. Once the capacity is full, the stack buffer "spills" into a heap buffer automatically.

Most of the d attributes in the huge file in thebug have fewer than 32 commands. That is, if we use the following:

pub struct PathBuilder {
    path_commands: SmallVec<[PathCommand; 32]>,
}

We are saying that there can be up to 32 items in the SmallVec without causing a heap allocation; once that is exceeded, it will work like a normal Vec .

At the end we still do into_boxed_slice to turn it into an independent heap allocation with an exact size.

This reduces the extra-heap quite a bit:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 33 24,139,598,653    1,416,831,176    1,329,943,212    86,887,964            0
                                                        ^^^^^^^^^^

Also, the total bytes shrink from 1,553,581,072 to 1,416,831,176 — we have a smaller heap because there is not so much work for the allocator, and there are a lot fewer temporary blocks when parsing the d attributes.

Making the code prettier

I put in the following:

/// This one is mutable
pub struct PathBuilder {
    path_commands: SmallVec<[PathCommand; 32]>,
}

/// This one is immutable
pub struct Path {
    path_commands: Box<[PathCommand]>,
}

impl PathBuilder {
    /// Consumes the PathBuilder and converts it into an immutable Path
    pub fn into_path(self) -> Path {
        Path {
            path_commands: self.path_commands.into_boxed_slice(),
        }
    }
}

With that, PathBuilder is just a temporary struct that turns into an immutable Path once we are done feeding it. Path contains a boxed slice of the exact size, without any extra capacity.

Next steps

All the coordinates in librsvg are stored as f64 , double-precision floating point numbers. The SVG/CSS spec says that single-precision floats are enough, and that 64-bit floats should be used only for geometric transformations.

I'm a bit scared to make that change; I'll have to look closely at the results of the test suite to see if rendered files change very much. I suppose even big maps require only as much precision as f32 — after all, that is what OpenStreetMap uses.

References


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

平台转型

平台转型

陈威如、王诗一、余卓轩(统筹) / 中信出版社 / 2016-1-10 / 58

《平台战略》续篇,陈威如等关于企业平台转型最新力作! 平台带来的商业革命已改写了现在及未来的企业生存规则,而这股浪潮已经从互联网行业漫延到了其他多种行业之中!如果说,过去10 年是平台商业模式在互联网行业的爆发期,那未来10 年,将是平台商业模式在传统行业转型应用上的黄金时代。 平台思维不再是互联网行业的专用词,它可以用来解构价值链,可以被运用到组织架构的设计中,更能够帮助企业升级竞争......一起来看看 《平台转型》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具