内容简介:HOPEis a fast dictionary-based compressor that encodes arbitrary byte-strings while preserving their order. It is optimized for compressing database index keys. Detailed description can be found in ourA simple example can be foundWe included a sample of th
High-speed Order-Preserving Encoder (HOPE)
HOPEis a fast dictionary-based compressor that encodes arbitrary byte-strings while preserving their order. It is optimized for compressing database index keys. Detailed description can be found in our SIGMOD paper .
Install Dependencies
sudo apt-get install build-essential cmake libgtest.dev cd /usr/src/gtest sudo cmake CMakeLists.txt sudo make sudo cp *.a /usr/lib
Build
mkdir build cd build cmake .. make -j
Usage Example
A simple example can be found here . To run the example:
cd build ./example
Unit Tests
make test
Benchmark
./scripts/run_experiment.sh [OPTION]
We included a sample of the Wiki and URL datasets in this repository. To reproduce the results in our paper, please download the full datasets (download links are in the paper) to replace the samples. Our Email dataset is private. You need to provide your own email list (email.txt) to run the corresponding experiments. Below are options to facilitate running a subset of the full benchmark:
Options
-r, --repeat_times=N
Run each experiment N times and report the average measurements. Default: 1.
--email, --wiki, --url
Run the benchmark using the Email/Wiki/URL dataset.
If unspecified, the scripts includes the Wiki and URL experiments.
--alldatasets
Include benchmarks for all three datasets.
--alm
Include the alm-based encoders. The other encoders (Single, Double, 3-gram, 4-gram) are enabled by default.
--surf, --art, --hot, --btree, --prefixbtree
Run the SuRF/ART/HOT/B+tree/prefix B+tree benchmark suite.
--all
Run the full benchmark. If unspecified, the script only runs the microbenchmarks for Wiki and URL.
The above script will record benchmark measurements under "results/". The master plotting script is under "scripts/". The individual scripts are under "plots/". Generated figures will be under "figures/". Make sure you run the benchmark with the --alm option on before using the plotting scripts.
License
Copyright 2020, Carnegie Mellon University
Licensed under the Apache License 2.0 .
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
程序员2010精华本
程序员杂志社 / 电子工业 / 2011-1 / 49.00元
《程序员(2010精华本)》主要内容:《程序员》创刊10年来,每年末编辑部精心打造的“合订本”已经形成一个品牌,得到广大读者的认可和喜爱。今年,《程序员》杂志内容再次进行了优化整合,除了每期推出的一个大型专题策划,各版块也纷纷以专题、策划的形式,将每月的重点进行了整合,让内容非常具有凝聚力,如专题篇、人物篇、实践篇等。另外杂志的版式、色彩方面也有了很大的飞跃,给读者带来耳目一新的阅读体验。一起来看看 《程序员2010精华本》 这本书的介绍吧!