内容简介:This is a fork ofThis can be useful if for all the conditions below, you have :The fork was born out of necessity to extract a bunch offediverse addresses from scraped web pages.
'hyper'grep - a fork of ripgrep (rg) with hyperscan support
This is a fork of
ripgrep
adding support for hyperscan
.
This can be useful if for all the conditions below, you have :
- at the very least several hundreds of regexps to parse simultaneously
- several dozens GB of data on a fast (>500MBs) disk / not a lot of CPU (<2) to spend for the regexp search
- regexps rarely changing while your data to parse is changing often
The fork was born out of necessity to extract a bunch offediverse addresses from scraped web pages.
We only here describe differences between this fork and the original ripgrep
. Please refer to the
original readme
for complete infos.
Installation / Building
Compared to ripgrep
we just offer basic installation facilities, e.g. using cargo
. You can
install cargo
if you don't have it already.
Beforehand, please refer to the original section of the readme
Don't forget to install the hyperscan
library and sources on your system first. Most distributions provide ready-to-go packages (e.g. libhyperscan-dev
on Debian/Ubuntu) or you can compile it from source
.
Note that on some environments if you compile from source (e.g. AWS EC-2) you need to add -fPIC
to the library compilation.
Finally checkout this repository and compile the fork:
$ git clone http://git.sr.ht/~pierrenn/ripgrep $ cd ripgrep $ git submodule update --init --recursive $ cargo install --path . --features 'hyperscan,pcre2' # if you want all 3 engines: default,pcre2,hyperscan $ # cargo install --path . --features 'hyperscan' # or if you want only 2 engines: default,hyperscan
And don't forget to add Cargo's bin directory to your path.
Note that the binary name for this fork of ripgrep is also rg
so it will overwrite the original binary (since we only add functionality this shouldn't be a problem).
Functionalities only available in this fork
TLDR: We just add a new engine named hyperscan
to ripgrep.
To use it : $ rg --engine hyperscan "my pattern" my_file
or via a file: $ rg --engine hyperscan -f myregexps my_file
Where myregexps
is a compiled hyperscan DB or a list of regexps in the standard format or the hyperscan format, e.g.:
some default regexp /some hyperscan regexp/imsHV8WcQ
where imsHV8WcQ
can be any subset of the following (case sensitive) option :
HS_FLAG_CASELESS HS_FLAG_MULTILINE HS_FLAG_DOTALL HS_FLAG_SINGLEMATCH HS_FLAG_ALLOWEMPTY HS_FLAG_UTF8 HS_FLAG_UCP HS_FLAG_COMBINATION HS_FLAG_QUIET
We also provide options --hyper--allow-empty
, --hyper-utf8
and --hyper-ucp
to override the value of each textual regular expression provided to hyperscan (ignored if you provide a compiled DB as we don't support DB edition). caseless
, multiline
and dotall
default ripgrep
options also override all regexps options (again, except when using a compiled hyperscan DB).
Finally, you can also save a compiled database DB to your disk. This can be useful as sometimes most of the time spent by ripgrep is to compile the DB (on a single core).
Use the -d/--hyper-write
parameter to save the DB to disk before starting the search :
$ # tell rg to read the myregexps text file, compile the regexps, write them to db.hs and finally search my_file $ rg --engine hyperscan -f myregexps -d db.hs my_file $ $ # now tell rg to directly read the compiled DB and search my_file2 - this will be quicker $ rg --engine hyperscan -f db.hs my_file2
Others
Please refer to the
original readme
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
算法概论
Sanjoy Dasgupta、Christos Papadimitriou、Umesh Vazirani / 钱枫 注、邹恒明 注 / 机械工业出版社 / 2009-1 / 55.00元
《算法概论(注释版)》源自加州大学伯克利分校和加州大学圣迭戈分校本科生的算法课讲义,以独特的视角展现了算法设计的精巧技术及魅力。在表达每一种技术时,强调每个算法背后的简洁数学思想,分析其时间和空间效率,运用与其他技术类比的方法来说明特征,并提供了大量实例。《算法概论(注释版)》以人类最古老的算法(算术运算)为起点,将各种算法中优美而有代表性的内容囊括书中,并以最前沿的理论(量子算法)结束,构成了较......一起来看看 《算法概论》 这本书的介绍吧!
XML 在线格式化
在线 XML 格式化压缩工具
HEX CMYK 转换工具
HEX CMYK 互转工具