SAT solver on top of regex matcher

栏目: IT技术 · 发布时间: 5年前

内容简介:A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.Perhaps, the most practical use of backrefer

A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.

Perhaps, the most practical use of backreferences I've heard is HTML tag matching (this regex isn't escaped properly):

<(.*)> ... </\1>

To match successfully, the second group must coincide with the first, like "<b>test</b>", but not "<a>test</b>". Another practical usage I've heard: match "string" or 'string', but not "string'.

Also, you can find two longest repeating substrings in an input string . See also .

Some other uses are very arcane: divide input number by sqrt(2) , detecting factorial number .

This post has been inspired by Reduction of 3-CNF-SAT to Perl Regular Expression Matching , please read it first. However the author incorrectly states that only 3SAT problems are solvable. In fact, any SAT instances are solvable, consisting of clauses of arbitrary sizes.

Also, since my SAT/CNF instances usually has more variables than 9 and I can't use backreferences like \1 ... \9, I use different method (Python): The syntax for backreferences in an expression such as (...)\1 refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension: (?P=name) indicates that the contents of the group called name should again be matched at the current point. The regular expression for finding doubled words, \b(\w+)\s+\1\b can also be written as \b(?P \w+)\s+(?P=word)\b

See also .

Now let's take this small CNF instance:

p cnf 5 11
-2 -5 0
-2 -4 0
-4 -5 0
-2 -3 0
-1 -4 0
-1 -5 0
-1 -2 0
-1 -3 0
-3 -4 0
-3 -5 0
1 2 3 4 5 0

This is what I call "popcnt1": only 1 variable must be true, all the rest are always false.

% picosat --all popcnt1.cnf
  
s SATISFIABLE
v -1 -2 -3 -4 5 0
s SATISFIABLE
v -1 -2 -3 4 -5 0
s SATISFIABLE
v -1 -2 3 -4 -5 0
s SATISFIABLE
v -1 2 -3 -4 -5 0
s SATISFIABLE
v 1 -2 -3 -4 -5 0
s SOLUTIONS 5

Now let's translate it to regex:

string=xxxxx;x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*;(?:(?P=a_2)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_4)x),(?:(?P=a_4)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_3)x),(?:(?P=a_1)x|(?P=a_4)x),(?:(?P=a_1)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x),(?:(?P=a_1)x|(?P=a_3)x),(?:(?P=a_3)x|(?P=a_4)x),(?:(?P=a_3)x|(?P=a_5)x),(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5)),
SAT
1 -2 -3 -4 -5

You see: an input string has as many x's as many variables the CNF instance has. Then, as many "x," substrings, as many clauses the instance has.

Take the first part of the pattern:

^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*; ...

Roughly, it means that the group can match "x", or may not. Let matcher decide on its own. The second part of the pattern are clauses. For the "1 2 3 4 5" clause we have "(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5))," there. That means, that the whole group must match "x", but we don't tell how: one of 5 subgroups may match, and each subgroup is actually backreference to the first part of pattern. But these terms are all positive. What about negative terms?

For the "-2 -5" clause, we have "(?:(?P=a_2)x|(?P=a_5)x)," there. That means that the whole group must match "x", and again, we don't tell how, but both backreferences a_2 and a_5 are prohibited, if present simultaneously. It's OK for a_2 match "x", but then the second part of the union would match. It's OK for a_5 match "x", and then the first part of the union would match. Also, it's OK for both a_2 and a_5 match nothing: either part of union will match "x" then.

If regex can't match, this means the input instance is unsatisfiable.

Likewise, I can run my popcnt4 instance, which commands to be 4 of input 8 variables be true:

string=xxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?).*;(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_7)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_8)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=

...

P=a_7)|(?P=a_8)),(?:(?P=a_3)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),(?:(?P=a_4)x|(?P=a_5)x|(?P=a_6)x|(?P=a_7)x|(?P=a_8)x),(?:(?P=a_4)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),
SAT
1 2 3 4 -5 -6 -7 -8

Also, I managed to solve the "Fred puzzle" frommy book (Ctrl-F: "Fred puzzle") in SAT/CNF form:

string=xxxxxxxxxxxxxxxxxxxxxxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?)(?P<a_9>x?)(?P<a_10>x?)(?P<a_11>x?)(?P<a_12>x?)(?P<a_13>x?)(?P<a_14>x?)(?P<a_15>x?)(?P<a_16>x?)(?P<a_17>x?)(?P<a_18>x?)(?P<a_19>x?)(?P<a_20>x?)(?P<a_21>x?)(?P<a_22>x?)(?P<a_23>x?)(?P<a_24>x?)(?P<a_25>x?)(?P<a_26>x?)(?P<a_27>x?)(?P<a_28>x?).*;(?:(?P=a_1)x),(?:(?P=a_2)),(?:(?P=a_3)x|(?P=a_4)x|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)x|(?P=a_9)),(?:(?P=a_3)x|(?P=a_4)|(?P=a_9)),(?:(?P=a_10)x|(?P=a_9)x),(?:(?P=a_10)|(?P=a_9)),(?:(?P=a_11)x|(?P=a_10)x),(?:(?P=a_11)|(?P=a_10)),(?:(?P=a_11)),(?:(?P=a_5)x|(?P=a_6)x|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)x|(?P=a_12)),(?:(?P=a_5)x|(?P=a_6)|(?P=a_12)),(?:(?P=a_13)x|(?P=a_12)x),(?:(?P=a_13)|(?P=a_12)),(?:(?P=a_14)x|(?P=a_13)x),(?:(?P=a_14)|(?P=a_13)),(?:(?P=a_14)),(?:(?P=a_7)x|(?P=a_8)x|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)x|(?P=a_15)),(?:(?P=a_7)x|(?P=a_8)|(?P=a_15)),(?:(?P=a_16)x|(?P=a_15)x),(?:(?P=a_16)|(?P=a_15)),(?:(?P=a_17)x|(?P=a_16)x),(?:(?P=a_17)|(?P=a_16)),(?:(?P=a_17)),(?:(?P=a_3)x|(?P=a_6)x|(?P=a_18)),(?:(?P=a_3)|(?P=a_18)x),(?:(?P=a_6)|(?P=a_18)x),(?:(?P=a_8)x|(?P=a_18)x|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)x|(?P=a_19)),(?:(?P=a_8)x|(?P=a_18)|(?P=a_19)),(?:(?P=a_20)x|(?P=a_19)x),(?:(?P=a_20)|(?P=a_19)),(?:(?P=a_20)),(?:(?P=a_21)x|(?P=a_7)x),(?:(?P=a_21)|(?P=a_7)),(?:(?P=a_21)|(?P=a_5)|(?P=a_22)x),(?:(?P=a_21)x|(?P=a_22)),(?:(?P=a_5)x|(?P=a_22)),(?:(?P=a_4)x|(?P=a_22)x|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)x|(?P=a_23)),(?:(?P=a_4)x|(?P=a_22)|(?P=a_23)),(?:(?P=a_24)x|(?P=a_23)x),(?:(?P=a_24)|(?P=a_23)),(?:(?P=a_24)),(?:(?P=a_3)|(?P=a_7)|(?P=a_25)x),(?:(?P=a_3)x|(?P=a_25)),(?:(?P=a_7)x|(?P=a_25)),(?:(?P=a_6)x|(?P=a_25)x|(?P=a_26)),(?:(?P=a_6)|(?P=a_26)x),(?:(?P=a_25)|(?P=a_26)x),(?:(?P=a_6)x|(?P=a_26)x|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)x|(?P=a_27)),(?:(?P=a_6)x|(?P=a_26)|(?P=a_27)),(?:(?P=a_28)x|(?P=a_27)x),(?:(?P=a_28)|(?P=a_27)),(?:(?P=a_28)),
SAT
-1 2 -3 4 5 -6 7 -8 9 -10 11 12 -13 14 15 -16 17 -18 -19 20 -21 22 -23 24 25 -26 -27 28

It took ~3 minutes on my old CPU clocked at ~2GHz.

The files, including the "solver" in Python3.

Of course, all this stuff isn't practical at all. But it demonstrates reduction from one problem (regex matching with backreferences) to another (SAT). Find a better algorithm for any of these problem and this would lead to revolution in computer science .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

一网打尽

一网打尽

[美]布拉德·斯通 / 李晶、李静 / 中信出版社 / 2014-1-15 / 49.00元

亚马逊最早起步于通过邮购来经营图书业务。但贝佐斯却不满足于仅做一名书商,他希望缔造亚马逊万货商店的神话——能提供海量的货源,并以超低的价格提供最具吸引力的便捷服务。为了实现这一诺言,他发展了一种企业文化,这种文化蕴含着执着的雄心与难以破解 的秘诀。亚马逊的这 一文化现在依旧在发扬光大。 布拉德·斯通非常幸运地得到采访亚马逊的前任和现任高管、员工以及贝佐斯本人、家人的机会,使我们第一次有机会深......一起来看看 《一网打尽》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

MD5 加密
MD5 加密

MD5 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试