unrpa_rs - A command line utility & library to extract RenPy archives

栏目: IT技术 · 发布时间: 4年前

内容简介:In this post I want to introduceFor a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this lit

In this post I want to introduce unrpa_rs which is a command line utility and library to extract RenPy archives ( RPAs ), written in the Rust programming language. This can be used to extract various assets that have been bundled in the RPA format. Currently RPAv3.2, RPAv3, and RPAv2 are supported.

Motivation

For a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this little side project to get better acquainted with Rust.

To start with the right level of difficulty I found a repo on github called rpatool which is Python tool create, modify and extract RenPy archive files. RenPy is an open source Python game engine to primarily create visual novels. Thus, with the basic functionality already known, I decided to implement the extraction functionality in Rust.

CLI Usage

USAGE:
    unrpa_rs [FLAGS] <INPUT>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
    -v, --verbose    Increase verbosity level (-v, -vv, -vvv, etc.)

ARGS:
    <INPUT>    The path to the archive file to read from

Disclaimer

Use this tool only on archives on which the authors allow modification or extraction. The unauthorized use is highly discouraged since this poses most likely a license violation.

How it works

After reading the Python source code and grasping the underlying idea, it turns out the involved steps are pretty straightforward. I will describe them here briefly in the order they occur. Sometimes a snippet of the source code is listed for the important bits.

Opening the file descriptor and extracting metadata

The first steps consist of opening the file descriptor to get access to the referenced file and to extract some metadata about the archive. The metadata sits right in the first line and is separated by spaces. The first string is the magic literal which denotes the version of the RPA format, e.g. “RPA-3.2”, “RPA-3.0”, “RPA-2.0”. The remaining data is the byte offset we need to jump to extract the indices , i.e. the files present in the archive, and the obfuscation key we need to deobfuscate the indices data. This key is simply constructed by subsequently applying XOR with the listed keys.

fn construct_obfuscation_key<S: AsRef<str>>(
        rpa_version: &RpaVersion,
        metadata: &[S],
    ) -> IntLen {
        let key: IntLen = match *rpa_version {
            RpaVersion::V3 => metadata.as_ref()[2..]
                .iter()
                .fold(0, |acc: IntLen, sub_key| {
                    acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap()
                }),
            RpaVersion::V3_2 => metadata.as_ref()[3..]
                .iter()
                .fold(0, |acc: IntLen, sub_key| {
                    acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap()
                }),
            RpaVersion::V2 => 0,
        };

        key
    }

Extracting the indices

The next steps consists of jumping to the offset, reading all bytes until EOF , and running a zlib decompression to get access to the decompressed byte buffer. Afterwards, we need to deserialize all indices with the Python pickle format.

// seek cursor to the decoded offset
reader.seek(SeekFrom::Start(offset))?;

let mut bytes: Vec<u8> = Vec::new();
// read everything util EOF
let bytes_read = reader.read_to_end(&mut bytes)?;
let mut decoded_bytes: Vec<u8> = Vec::with_capacity(2 * bytes_read);

// read the content by decoding it with zlib
ZlibDecoder::new(&bytes[..]).read_to_end(&mut decoded_bytes)?;
let deserialized_indices: RpaIdx = serde_pickle::from_slice(&decoded_bytes)?;

For the zlib decompression I used the flate2 crate, and for the pickle stuff serde in combination with serde-pickle . Now, we have a list of indices with fields like offset and len . However, for RPAv3 and RPAv3.2 we need to deobfuscate these fields by once more applying XOR with the previously yield obfuscation key.

Reading the byte buffer of the indices

With the previous step we now have all metadata to read a byte buffer of every index into memory. We only need to jump to the individual offset of the index and read exactly len bytes. However, since RPAv3.2 has a prefix field we need to encode that with latin1 or also called ISO-8859-1 and subtract the raw prefix len from the original offset. In this case byte buffer is appended at the encoded prefix and then returned. However, all RPAv3.2 I have seen up to this point have an empty prefix which yields also an empty encoded prefix. This portion of the code is rather untested.

self.reader.seek(SeekFrom::Start(offset))?;

let desired_capacity = len as usize - prefix.unwrap_or("").len();
let mut encoded_prefix = ISO_8859_1.encode(prefix.unwrap_or(""), EncoderTrap::Strict)?;

let mut buf = vec![0u8; desired_capacity];
// now read exactly `desired_capacity` bytes
self.reader.read_exact(&mut buf)?;
assert_eq!(desired_capacity, buf.len());

// append the byte vector at the prefix vector if it's not empty
if self.version == RpaVersion::V3 && !encoded_prefix.is_empty() {
    encoded_prefix.append(&mut buf);
    Ok(encoded_prefix)
} else {
    Ok(buf)
}

For the latin1 encoding process I used the encoding crate. Since we know have the raw byte buffer we can write that directly to the disk. This is rather uninteresting, so I am not listing that here.

Multithreading?

My original plan was to speed the reading process of all indices up by reading the byte buffers in parallel. However, I soon realized this is not possible since the access to underlying file resource can not be shared by multiple threads. Since I am not simply sequentially reading the file, but rather need to jump to a individual byte offset for every index, I decided to use the memmap crate in order to have a file-backed immutable memory map in the form a of Cursor I can use to jump around in the file and perform normal reading operations.

Benchmarks performed with the criterion crate showed a significant performance gain with this change. You can checkout them out yourself if you want and run them. Checkout the delete-io-systemcall tag in the repo and run the benchmarks by running cargo bench . Then checkout v0.3.0 and run them again to see the changes in the report. The nature of this being an I/O intensive project means that fast disks, such as SSDs mean better performance. Also, Linux and macOS offer in general better performance since I/O on Windows is more expensive.

Feedback welcome

Since this is my first Rust project I am open to accept suggestions from the community if they know a better way to do certain things. The gitlab repository would be the best place to go for that.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机程序设计艺术卷1:基本算法(英文版.第3版)

计算机程序设计艺术卷1:基本算法(英文版.第3版)

Donald E.Knuth / 人民邮电出版社 / 2010-10 / 119.00元

《计算机程序设计艺术》系列著作对计算机领域产生了深远的影响。这一系列堪称一项浩大的工程,自1962年开始编写,计划出版7卷,目前已经出版了4卷。《美国科学家》杂志曾将这套书与爱因斯坦的《相对论》等书并列称为20世纪最重要的12本物理学著作。目前Knuth正将毕生精力投入到这部史诗性著作的撰写中。想了解本书最新信息,请访http://www-cs-faculty.stanford.edu/~knut......一起来看看 《计算机程序设计艺术卷1:基本算法(英文版.第3版)》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具