Harness the power of hyperscan on the CLI with this fork of ripgrep

栏目: IT技术 · 发布时间: 5年前

内容简介:This is a fork ofThis can be useful if for all the conditions below, you have :The fork was born out of necessity to extract a bunch offediverse addresses from scraped web pages.

'hyper'grep - a fork of ripgrep (rg) with hyperscan support

This is a fork of ripgrep adding support for hyperscan .

This can be useful if for all the conditions below, you have :

  1. at the very least several hundreds of regexps to parse simultaneously
  2. several dozens GB of data on a fast (>500MBs) disk / not a lot of CPU (<2) to spend for the regexp search
  3. regexps rarely changing while your data to parse is changing often

The fork was born out of necessity to extract a bunch offediverse addresses from scraped web pages.

We only here describe differences between this fork and the original ripgrep . Please refer to the original readme for complete infos.

Installation / Building

Compared to ripgrep we just offer basic installation facilities, e.g. using cargo . You can install cargo if you don't have it already.

Beforehand, please refer to the original section of the readme

Don't forget to install the hyperscan library and sources on your system first. Most distributions provide ready-to-go packages (e.g. libhyperscan-dev on Debian/Ubuntu) or you can compile it from source .

Note that on some environments if you compile from source (e.g. AWS EC-2) you need to add -fPIC to the library compilation.

Finally checkout this repository and compile the fork:

$ git clone http://git.sr.ht/~pierrenn/ripgrep
$ cd ripgrep
$ git submodule update --init --recursive
$ cargo install --path . --features 'hyperscan,pcre2' # if you want all 3 engines: default,pcre2,hyperscan
$ # cargo install --path . --features 'hyperscan' # or if you want only 2 engines: default,hyperscan

And don't forget to add Cargo's bin directory to your path.

Note that the binary name for this fork of ripgrep is also rg so it will overwrite the original binary (since we only add functionality this shouldn't be a problem).

Functionalities only available in this fork

TLDR: We just add a new engine named hyperscan to ripgrep.

To use it : $ rg --engine hyperscan "my pattern" my_file

or via a file: $ rg --engine hyperscan -f myregexps my_file

Where myregexps is a compiled hyperscan DB or a list of regexps in the standard format or the hyperscan format, e.g.:

some default regexp
/some hyperscan regexp/imsHV8WcQ

where imsHV8WcQ can be any subset of the following (case sensitive) option :

HS_FLAG_CASELESS
HS_FLAG_MULTILINE
HS_FLAG_DOTALL
HS_FLAG_SINGLEMATCH
HS_FLAG_ALLOWEMPTY
HS_FLAG_UTF8
HS_FLAG_UCP
HS_FLAG_COMBINATION
HS_FLAG_QUIET

We also provide options --hyper--allow-empty , --hyper-utf8 and --hyper-ucp to override the value of each textual regular expression provided to hyperscan (ignored if you provide a compiled DB as we don't support DB edition). caseless , multiline and dotall default ripgrep options also override all regexps options (again, except when using a compiled hyperscan DB).

Finally, you can also save a compiled database DB to your disk. This can be useful as sometimes most of the time spent by ripgrep is to compile the DB (on a single core). Use the -d/--hyper-write parameter to save the DB to disk before starting the search :

$ # tell rg to read the myregexps text file, compile the regexps, write them to db.hs and finally search my_file
$ rg --engine hyperscan -f myregexps -d db.hs my_file
$
$ # now tell rg to directly read the compiled DB and search my_file2 - this will be quicker
$ rg --engine hyperscan -f db.hs my_file2

Others

Please refer to the original readme


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深入体验Java Web开发内幕

深入体验Java Web开发内幕

张孝祥 / 电子工业出版社 / 2007-12 / 55.00元

《深入体验Java Web开发内幕:高级特性》是《深入体验Java Web开发内幕——核心基础》的姊妹篇,Java Web开发的初学者在阅读《深入体验Java Web开发内幕:高级特性》前,应该先学习《深入体验Java Web开发内幕——核心基础》。《深入体验Java Web开发内幕:高级特性》详细阐述了Java Web应用开发中的各种高级特性——Apache文件上传组件的源码分析及应用和编写原理......一起来看看 《深入体验Java Web开发内幕》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

MD5 加密
MD5 加密

MD5 加密工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具