内容简介:In this post I want to introduceFor a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this lit
In this post I want to introduce unrpa_rs which is a command line utility and library to extract RenPy archives ( RPAs ), written in the Rust programming language. This can be used to extract various assets that have been bundled in the RPA format. Currently RPAv3.2, RPAv3, and RPAv2 are supported.
Motivation
For a long time I was interested in building something with Rust, however, up until now I never found the time to actually do that. After finishing my bachelor thesis last semester I finally had enough time to start this little side project to get better acquainted with Rust.
To start with the right level of difficulty I found a repo on github called rpatool which is Python tool create, modify and extract RenPy archive files. RenPy is an open source Python game engine to primarily create visual novels. Thus, with the basic functionality already known, I decided to implement the extraction functionality in Rust.
CLI Usage
USAGE: unrpa_rs [FLAGS] <INPUT> FLAGS: -h, --help Prints help information -V, --version Prints version information -v, --verbose Increase verbosity level (-v, -vv, -vvv, etc.) ARGS: <INPUT> The path to the archive file to read from
Disclaimer
Use this tool only on archives on which the authors allow modification or extraction. The unauthorized use is highly discouraged since this poses most likely a license violation.
How it works
After reading the Python source code and grasping the underlying idea, it turns out the involved steps are pretty straightforward. I will describe them here briefly in the order they occur. Sometimes a snippet of the source code is listed for the important bits.
Opening the file descriptor and extracting metadata
The first steps consist of opening the file descriptor to get access to the referenced file and to extract some metadata about the archive. The metadata sits right in the first line and is separated by spaces. The first string is the magic literal
which denotes the version of the RPA format, e.g. “RPA-3.2”, “RPA-3.0”, “RPA-2.0”. The remaining data is the byte offset
we need to jump to extract the indices
, i.e. the files present in the archive, and the obfuscation key
we need to deobfuscate the indices data. This key is simply constructed by subsequently applying XOR
with the listed keys.
fn construct_obfuscation_key<S: AsRef<str>>( rpa_version: &RpaVersion, metadata: &[S], ) -> IntLen { let key: IntLen = match *rpa_version { RpaVersion::V3 => metadata.as_ref()[2..] .iter() .fold(0, |acc: IntLen, sub_key| { acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap() }), RpaVersion::V3_2 => metadata.as_ref()[3..] .iter() .fold(0, |acc: IntLen, sub_key| { acc ^ IntLen::from_str_radix(sub_key.as_ref(), 16).unwrap() }), RpaVersion::V2 => 0, }; key }
Extracting the indices
The next steps consists of jumping to the offset, reading all bytes until EOF
, and running a zlib decompression to get access to the decompressed byte buffer. Afterwards, we need to deserialize all indices with the Python pickle format.
// seek cursor to the decoded offset reader.seek(SeekFrom::Start(offset))?; let mut bytes: Vec<u8> = Vec::new(); // read everything util EOF let bytes_read = reader.read_to_end(&mut bytes)?; let mut decoded_bytes: Vec<u8> = Vec::with_capacity(2 * bytes_read); // read the content by decoding it with zlib ZlibDecoder::new(&bytes[..]).read_to_end(&mut decoded_bytes)?; let deserialized_indices: RpaIdx = serde_pickle::from_slice(&decoded_bytes)?;
For the zlib decompression I used the flate2
crate, and for the pickle stuff serde
in combination with serde-pickle
.
Now, we have a list of indices with fields like offset
and len
. However, for RPAv3 and RPAv3.2 we need to deobfuscate these fields by once more applying XOR
with the previously yield obfuscation key.
Reading the byte buffer of the indices
With the previous step we now have all metadata to read a byte buffer of every index into memory. We only need to jump to the individual offset
of the index and read exactly len
bytes. However, since RPAv3.2 has a prefix
field we need to encode that with latin1
or also called ISO-8859-1
and subtract the raw prefix len from the original offset. In this case byte buffer is appended at the encoded prefix and then returned. However, all RPAv3.2 I have seen up to this point have an empty prefix which yields also an empty encoded prefix. This portion of the code is rather untested.
self.reader.seek(SeekFrom::Start(offset))?; let desired_capacity = len as usize - prefix.unwrap_or("").len(); let mut encoded_prefix = ISO_8859_1.encode(prefix.unwrap_or(""), EncoderTrap::Strict)?; let mut buf = vec![0u8; desired_capacity]; // now read exactly `desired_capacity` bytes self.reader.read_exact(&mut buf)?; assert_eq!(desired_capacity, buf.len()); // append the byte vector at the prefix vector if it's not empty if self.version == RpaVersion::V3 && !encoded_prefix.is_empty() { encoded_prefix.append(&mut buf); Ok(encoded_prefix) } else { Ok(buf) }
For the latin1
encoding process I used the encoding
crate.
Since we know have the raw byte buffer we can write that directly to the disk. This is rather uninteresting, so I am not listing that here.
Multithreading?
My original plan was to speed the reading process of all indices up by reading the byte buffers in parallel. However, I soon realized this is not possible since the access to underlying file resource can not be shared by multiple threads. Since I am not simply sequentially reading the file, but rather need to jump to a individual byte offset for every index, I decided to use the memmap
crate in order to have a file-backed immutable memory map in the form a of Cursor
I can use to jump around in the file and perform normal reading operations.
Benchmarks performed with the criterion
crate showed a significant performance gain with this change. You can checkout them out yourself if you want and run them. Checkout the delete-io-systemcall
tag in the repo and run the benchmarks by running cargo bench
. Then checkout v0.3.0
and run them again to see the changes in the report. The nature of this being an I/O intensive project means that fast disks, such as SSDs mean better performance. Also, Linux and macOS offer in general better performance since I/O on Windows is more expensive.
Feedback welcome
Since this is my first Rust project I am open to accept suggestions from the community if they know a better way to do certain things. The gitlab repository would be the best place to go for that.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。