SAT solver on top of regex matcher

栏目: IT技术 · 发布时间: 4年前

内容简介：A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.Perhaps, the most practical use of backrefer

A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.

Perhaps, the most practical use of backreferences I've heard is HTML tag matching (this regex isn't escaped properly):

<(.*)> ... </\1>

To match successfully, the second group must coincide with the first, like "<b>test</b>", but not "<a>test</b>". Another practical usage I've heard: match "string" or 'string', but not "string'.

Also, you can find two longest repeating substrings in an input string . See also .

Some other uses are very arcane: divide input number by sqrt(2) , detecting factorial number .

This post has been inspired by Reduction of 3-CNF-SAT to Perl Regular Expression Matching , please read it first. However the author incorrectly states that only 3SAT problems are solvable. In fact, any SAT instances are solvable, consisting of clauses of arbitrary sizes.

Also, since my SAT/CNF instances usually has more variables than 9 and I can't use backreferences like \1 ... \9, I use different method (Python): The syntax for backreferences in an expression such as (...)\1 refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension: (?P=name) indicates that the contents of the group called name should again be matched at the current point. The regular expression for finding doubled words, \b(\w+)\s+\1\b can also be written as \b(?P \w+)\s+(?P=word)\b

See also .

Now let's take this small CNF instance:

p cnf 5 11
-2 -5 0
-2 -4 0
-4 -5 0
-2 -3 0
-1 -4 0
-1 -5 0
-1 -2 0
-1 -3 0
-3 -4 0
-3 -5 0
1 2 3 4 5 0

This is what I call "popcnt1": only 1 variable must be true, all the rest are always false.

% picosat --all popcnt1.cnf
  
s SATISFIABLE
v -1 -2 -3 -4 5 0
s SATISFIABLE
v -1 -2 -3 4 -5 0
s SATISFIABLE
v -1 -2 3 -4 -5 0
s SATISFIABLE
v -1 2 -3 -4 -5 0
s SATISFIABLE
v 1 -2 -3 -4 -5 0
s SOLUTIONS 5

Now let's translate it to regex:

string=xxxxx;x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*;(?:(?P=a_2)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_4)x),(?:(?P=a_4)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_3)x),(?:(?P=a_1)x|(?P=a_4)x),(?:(?P=a_1)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x),(?:(?P=a_1)x|(?P=a_3)x),(?:(?P=a_3)x|(?P=a_4)x),(?:(?P=a_3)x|(?P=a_5)x),(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5)),
SAT
1 -2 -3 -4 -5

You see: an input string has as many x's as many variables the CNF instance has. Then, as many "x," substrings, as many clauses the instance has.

Take the first part of the pattern:

^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*; ...

Roughly, it means that the group can match "x", or may not. Let matcher decide on its own. The second part of the pattern are clauses. For the "1 2 3 4 5" clause we have "(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5))," there. That means, that the whole group must match "x", but we don't tell how: one of 5 subgroups may match, and each subgroup is actually backreference to the first part of pattern. But these terms are all positive. What about negative terms?

For the "-2 -5" clause, we have "(?:(?P=a_2)x|(?P=a_5)x)," there. That means that the whole group must match "x", and again, we don't tell how, but both backreferences a_2 and a_5 are prohibited, if present simultaneously. It's OK for a_2 match "x", but then the second part of the union would match. It's OK for a_5 match "x", and then the first part of the union would match. Also, it's OK for both a_2 and a_5 match nothing: either part of union will match "x" then.

If regex can't match, this means the input instance is unsatisfiable.

Likewise, I can run my popcnt4 instance, which commands to be 4 of input 8 variables be true:

string=xxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?).*;(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_7)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_8)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=

...

P=a_7)|(?P=a_8)),(?:(?P=a_3)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),(?:(?P=a_4)x|(?P=a_5)x|(?P=a_6)x|(?P=a_7)x|(?P=a_8)x),(?:(?P=a_4)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),
SAT
1 2 3 4 -5 -6 -7 -8

Also, I managed to solve the "Fred puzzle" frommy book (Ctrl-F: "Fred puzzle") in SAT/CNF form:

string=xxxxxxxxxxxxxxxxxxxxxxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?)(?P<a_9>x?)(?P<a_10>x?)(?P<a_11>x?)(?P<a_12>x?)(?P<a_13>x?)(?P<a_14>x?)(?P<a_15>x?)(?P<a_16>x?)(?P<a_17>x?)(?P<a_18>x?)(?P<a_19>x?)(?P<a_20>x?)(?P<a_21>x?)(?P<a_22>x?)(?P<a_23>x?)(?P<a_24>x?)(?P<a_25>x?)(?P<a_26>x?)(?P<a_27>x?)(?P<a_28>x?).*;(?:(?P=a_1)x),(?:(?P=a_2)),(?:(?P=a_3)x|(?P=a_4)x|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)x|(?P=a_9)),(?:(?P=a_3)x|(?P=a_4)|(?P=a_9)),(?:(?P=a_10)x|(?P=a_9)x),(?:(?P=a_10)|(?P=a_9)),(?:(?P=a_11)x|(?P=a_10)x),(?:(?P=a_11)|(?P=a_10)),(?:(?P=a_11)),(?:(?P=a_5)x|(?P=a_6)x|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)x|(?P=a_12)),(?:(?P=a_5)x|(?P=a_6)|(?P=a_12)),(?:(?P=a_13)x|(?P=a_12)x),(?:(?P=a_13)|(?P=a_12)),(?:(?P=a_14)x|(?P=a_13)x),(?:(?P=a_14)|(?P=a_13)),(?:(?P=a_14)),(?:(?P=a_7)x|(?P=a_8)x|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)x|(?P=a_15)),(?:(?P=a_7)x|(?P=a_8)|(?P=a_15)),(?:(?P=a_16)x|(?P=a_15)x),(?:(?P=a_16)|(?P=a_15)),(?:(?P=a_17)x|(?P=a_16)x),(?:(?P=a_17)|(?P=a_16)),(?:(?P=a_17)),(?:(?P=a_3)x|(?P=a_6)x|(?P=a_18)),(?:(?P=a_3)|(?P=a_18)x),(?:(?P=a_6)|(?P=a_18)x),(?:(?P=a_8)x|(?P=a_18)x|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)x|(?P=a_19)),(?:(?P=a_8)x|(?P=a_18)|(?P=a_19)),(?:(?P=a_20)x|(?P=a_19)x),(?:(?P=a_20)|(?P=a_19)),(?:(?P=a_20)),(?:(?P=a_21)x|(?P=a_7)x),(?:(?P=a_21)|(?P=a_7)),(?:(?P=a_21)|(?P=a_5)|(?P=a_22)x),(?:(?P=a_21)x|(?P=a_22)),(?:(?P=a_5)x|(?P=a_22)),(?:(?P=a_4)x|(?P=a_22)x|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)x|(?P=a_23)),(?:(?P=a_4)x|(?P=a_22)|(?P=a_23)),(?:(?P=a_24)x|(?P=a_23)x),(?:(?P=a_24)|(?P=a_23)),(?:(?P=a_24)),(?:(?P=a_3)|(?P=a_7)|(?P=a_25)x),(?:(?P=a_3)x|(?P=a_25)),(?:(?P=a_7)x|(?P=a_25)),(?:(?P=a_6)x|(?P=a_25)x|(?P=a_26)),(?:(?P=a_6)|(?P=a_26)x),(?:(?P=a_25)|(?P=a_26)x),(?:(?P=a_6)x|(?P=a_26)x|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)x|(?P=a_27)),(?:(?P=a_6)x|(?P=a_26)|(?P=a_27)),(?:(?P=a_28)x|(?P=a_27)x),(?:(?P=a_28)|(?P=a_27)),(?:(?P=a_28)),
SAT
-1 2 -3 4 5 -6 7 -8 9 -10 11 12 -13 14 15 -16 17 -18 -19 20 -21 22 -23 24 25 -26 -27 28

It took ~3 minutes on my old CPU clocked at ~2GHz.

The files, including the "solver" in Python3.

Of course, all this stuff isn't practical at all. But it demonstrates reduction from one problem (regex matching with backreferences) to another (SAT). Find a better algorithm for any of these problem and this would lead to revolution in computer science .

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

SAT solver on top of regex matcher

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

算法学

哈雷尔 / 第1版 (2006年2月1日) / 2006年2月1日 / 38.0

本书的意图在于按序学习或研究，而不是作为一个参考。因而按照每章依赖于前面章节的结构组织本书，且流畅易读。第一部分预备知识中的大部分材料对于那些具有程序设计背景的人是熟悉的。无论是否恰当，本书包含了计算机科学家当前感兴趣的研究专题的简明讨论。这本教科书的书后有每章详细参考书目的注记，并通过“后向”指针把教科书中的讨论与相关文献联系起来。目前的版本包含大量习题，以及大约三分之一的题解。可用题解作为教科......一起来看看《算法学》这本书的介绍吧!

码农工具

SAT solver on top of regex matcher

算法学

JSON 在线解析

MD5 加密

HEX HSV 转换工具