unrpa_rs - A command line utility & library to extract RenPy archives

栏目: IT技术 · 发布时间: 4年前

内容简介:In this post I want to introduceFor a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this lit

In this post I want to introduce unrpa_rs which is a command line utility and library to extract RenPy archives ( RPAs ), written in the Rust programming language. This can be used to extract various assets that have been bundled in the RPA format. Currently RPAv3.2, RPAv3, and RPAv2 are supported.

Motivation

For a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this little side project to get better acquainted with Rust.

To start with the right level of difficulty I found a repo on github called rpatool which is Python tool create, modify and extract RenPy archive files. RenPy is an open source Python game engine to primarily create visual novels. Thus, with the basic functionality already known, I decided to implement the extraction functionality in Rust.

CLI Usage

USAGE:
    unrpa_rs [FLAGS] <INPUT>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
    -v, --verbose    Increase verbosity level (-v, -vv, -vvv, etc.)

ARGS:
    <INPUT>    The path to the archive file to read from

Disclaimer

Use this tool only on archives on which the authors allow modification or extraction. The unauthorized use is highly discouraged since this poses most likely a license violation.

How it works

After reading the Python source code and grasping the underlying idea, it turns out the involved steps are pretty straightforward. I will describe them here briefly in the order they occur. Sometimes a snippet of the source code is listed for the important bits.

Opening the file descriptor and extracting metadata

The first steps consist of opening the file descriptor to get access to the referenced file and to extract some metadata about the archive. The metadata sits right in the first line and is separated by spaces. The first string is the magic literal which denotes the version of the RPA format, e.g. “RPA-3.2”, “RPA-3.0”, “RPA-2.0”. The remaining data is the byte offset we need to jump to extract the indices , i.e. the files present in the archive, and the obfuscation key we need to deobfuscate the indices data. This key is simply constructed by subsequently applying XOR with the listed keys.

fn construct_obfuscation_key<S: AsRef<str>>(
        rpa_version: &RpaVersion,
        metadata: &[S],
    ) -> IntLen {
        let key: IntLen = match *rpa_version {
            RpaVersion::V3 => metadata.as_ref()[2..]
                .iter()
                .fold(0, |acc: IntLen, sub_key| {
                    acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap()
                }),
            RpaVersion::V3_2 => metadata.as_ref()[3..]
                .iter()
                .fold(0, |acc: IntLen, sub_key| {
                    acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap()
                }),
            RpaVersion::V2 => 0,
        };

        key
    }

Extracting the indices

The next steps consists of jumping to the offset, reading all bytes until EOF , and running a zlib decompression to get access to the decompressed byte buffer. Afterwards, we need to deserialize all indices with the Python pickle format.

// seek cursor to the decoded offset
reader.seek(SeekFrom::Start(offset))?;

let mut bytes: Vec<u8> = Vec::new();
// read everything util EOF
let bytes_read = reader.read_to_end(&mut bytes)?;
let mut decoded_bytes: Vec<u8> = Vec::with_capacity(2 * bytes_read);

// read the content by decoding it with zlib
ZlibDecoder::new(&bytes[..]).read_to_end(&mut decoded_bytes)?;
let deserialized_indices: RpaIdx = serde_pickle::from_slice(&decoded_bytes)?;

For the zlib decompression I used the flate2 crate, and for the pickle stuff serde in combination with serde-pickle . Now, we have a list of indices with fields like offset and len . However, for RPAv3 and RPAv3.2 we need to deobfuscate these fields by once more applying XOR with the previously yield obfuscation key.

Reading the byte buffer of the indices

With the previous step we now have all metadata to read a byte buffer of every index into memory. We only need to jump to the individual offset of the index and read exactly len bytes. However, since RPAv3.2 has a prefix field we need to encode that with latin1 or also called ISO-8859-1 and subtract the raw prefix len from the original offset. In this case byte buffer is appended at the encoded prefix and then returned. However, all RPAv3.2 I have seen up to this point have an empty prefix which yields also an empty encoded prefix. This portion of the code is rather untested.

self.reader.seek(SeekFrom::Start(offset))?;

let desired_capacity = len as usize - prefix.unwrap_or("").len();
let mut encoded_prefix = ISO_8859_1.encode(prefix.unwrap_or(""), EncoderTrap::Strict)?;

let mut buf = vec![0u8; desired_capacity];
// now read exactly `desired_capacity` bytes
self.reader.read_exact(&mut buf)?;
assert_eq!(desired_capacity, buf.len());

// append the byte vector at the prefix vector if it's not empty
if self.version == RpaVersion::V3 && !encoded_prefix.is_empty() {
    encoded_prefix.append(&mut buf);
    Ok(encoded_prefix)
} else {
    Ok(buf)
}

For the latin1 encoding process I used the encoding crate. Since we know have the raw byte buffer we can write that directly to the disk. This is rather uninteresting, so I am not listing that here.

Multithreading?

My original plan was to speed the reading process of all indices up by reading the byte buffers in parallel. However, I soon realized this is not possible since the access to underlying file resource can not be shared by multiple threads. Since I am not simply sequentially reading the file, but rather need to jump to a individual byte offset for every index, I decided to use the memmap crate in order to have a file-backed immutable memory map in the form a of Cursor I can use to jump around in the file and perform normal reading operations.

Benchmarks performed with the criterion crate showed a significant performance gain with this change. You can checkout them out yourself if you want and run them. Checkout the delete-io-systemcall tag in the repo and run the benchmarks by running cargo bench . Then checkout v0.3.0 and run them again to see the changes in the report. The nature of this being an I/O intensive project means that fast disks, such as SSDs mean better performance. Also, Linux and macOS offer in general better performance since I/O on Windows is more expensive.

Feedback welcome

Since this is my first Rust project I am open to accept suggestions from the community if they know a better way to do certain things. The gitlab repository would be the best place to go for that.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

商战

商战

杰克•特劳特、阿尔•里斯 / 李正栓、李腾 / 机械工业出版社 / 2011-3 / 42.00元

本书重点阐述了商战中的四种常用战略形式,如防御战、进攻战、侧翼战和游击战,针对每一种形式又提出了三条应遵循的原则,以及如何在具体的商战中应用这些原则。本书分析了商战中的实际案例:可口可乐与百事可乐的战役,汉堡王与温迪斯对麦当劳的挑战以及DEC对阵IBM等。这些人们熟知品牌的案例在作者精心的组织下,使读者不仅加深了对本书中心思想的理解,而且学习了如何在实战中具体应用各种营销战略和策略的技巧。 ......一起来看看 《商战》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

MD5 加密
MD5 加密

MD5 加密工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器