SAT solver on top of regex matcher

栏目: IT技术 · 发布时间: 4年前

内容简介:A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.Perhaps, the most practical use of backrefer

A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex 'backreferences' extension extends regex matching to a (hard) NP-problem. Backreferences usually denoted as \1, \2, etc.

Perhaps, the most practical use of backreferences I've heard is HTML tag matching (this regex isn't escaped properly):

<(.*)> ... </\1>

To match successfully, the second group must coincide with the first, like "<b>test</b>", but not "<a>test</b>". Another practical usage I've heard: match "string" or 'string', but not "string'.

Also, you can find two longest repeating substrings in an input string . See also .

Some other uses are very arcane: divide input number by sqrt(2) , detecting factorial number .

This post has been inspired by Reduction of 3-CNF-SAT to Perl Regular Expression Matching , please read it first. However the author incorrectly states that only 3SAT problems are solvable. In fact, any SAT instances are solvable, consisting of clauses of arbitrary sizes.

Also, since my SAT/CNF instances usually has more variables than 9 and I can't use backreferences like \1 ... \9, I use different method (Python): The syntax for backreferences in an expression such as (...)\1 refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension: (?P=name) indicates that the contents of the group called name should again be matched at the current point. The regular expression for finding doubled words, \b(\w+)\s+\1\b can also be written as \b(?P \w+)\s+(?P=word)\b

See also .

Now let's take this small CNF instance:

p cnf 5 11
-2 -5 0
-2 -4 0
-4 -5 0
-2 -3 0
-1 -4 0
-1 -5 0
-1 -2 0
-1 -3 0
-3 -4 0
-3 -5 0
1 2 3 4 5 0

This is what I call "popcnt1": only 1 variable must be true, all the rest are always false.

% picosat --all popcnt1.cnf
  
s SATISFIABLE
v -1 -2 -3 -4 5 0
s SATISFIABLE
v -1 -2 -3 4 -5 0
s SATISFIABLE
v -1 -2 3 -4 -5 0
s SATISFIABLE
v -1 2 -3 -4 -5 0
s SATISFIABLE
v 1 -2 -3 -4 -5 0
s SOLUTIONS 5

Now let's translate it to regex:

string=xxxxx;x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*;(?:(?P=a_2)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_4)x),(?:(?P=a_4)x|(?P=a_5)x),(?:(?P=a_2)x|(?P=a_3)x),(?:(?P=a_1)x|(?P=a_4)x),(?:(?P=a_1)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x),(?:(?P=a_1)x|(?P=a_3)x),(?:(?P=a_3)x|(?P=a_4)x),(?:(?P=a_3)x|(?P=a_5)x),(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5)),
SAT
1 -2 -3 -4 -5

You see: an input string has as many x's as many variables the CNF instance has. Then, as many "x," substrings, as many clauses the instance has.

Take the first part of the pattern:

^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?).*; ...

Roughly, it means that the group can match "x", or may not. Let matcher decide on its own. The second part of the pattern are clauses. For the "1 2 3 4 5" clause we have "(?:(?P=a_1)|(?P=a_2)|(?P=a_3)|(?P=a_4)|(?P=a_5))," there. That means, that the whole group must match "x", but we don't tell how: one of 5 subgroups may match, and each subgroup is actually backreference to the first part of pattern. But these terms are all positive. What about negative terms?

For the "-2 -5" clause, we have "(?:(?P=a_2)x|(?P=a_5)x)," there. That means that the whole group must match "x", and again, we don't tell how, but both backreferences a_2 and a_5 are prohibited, if present simultaneously. It's OK for a_2 match "x", but then the second part of the union would match. It's OK for a_5 match "x", and then the first part of the union would match. Also, it's OK for both a_2 and a_5 match nothing: either part of union will match "x" then.

If regex can't match, this means the input instance is unsatisfiable.

Likewise, I can run my popcnt4 instance, which commands to be 4 of input 8 variables be true:

string=xxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?).*;(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_5)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_7)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_4)x|(?P=a_8)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=a_6)x),(?:(?P=a_1)x|(?P=a_2)x|(?P=a_3)x|(?P=a_5)x|(?P=

...

P=a_7)|(?P=a_8)),(?:(?P=a_3)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),(?:(?P=a_4)x|(?P=a_5)x|(?P=a_6)x|(?P=a_7)x|(?P=a_8)x),(?:(?P=a_4)|(?P=a_5)|(?P=a_6)|(?P=a_7)|(?P=a_8)),
SAT
1 2 3 4 -5 -6 -7 -8

Also, I managed to solve the "Fred puzzle" frommy book (Ctrl-F: "Fred puzzle") in SAT/CNF form:

string=xxxxxxxxxxxxxxxxxxxxxxxxxxxx;x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,
pattern=^(?P<a_1>x?)(?P<a_2>x?)(?P<a_3>x?)(?P<a_4>x?)(?P<a_5>x?)(?P<a_6>x?)(?P<a_7>x?)(?P<a_8>x?)(?P<a_9>x?)(?P<a_10>x?)(?P<a_11>x?)(?P<a_12>x?)(?P<a_13>x?)(?P<a_14>x?)(?P<a_15>x?)(?P<a_16>x?)(?P<a_17>x?)(?P<a_18>x?)(?P<a_19>x?)(?P<a_20>x?)(?P<a_21>x?)(?P<a_22>x?)(?P<a_23>x?)(?P<a_24>x?)(?P<a_25>x?)(?P<a_26>x?)(?P<a_27>x?)(?P<a_28>x?).*;(?:(?P=a_1)x),(?:(?P=a_2)),(?:(?P=a_3)x|(?P=a_4)x|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)|(?P=a_9)x),(?:(?P=a_3)|(?P=a_4)x|(?P=a_9)),(?:(?P=a_3)x|(?P=a_4)|(?P=a_9)),(?:(?P=a_10)x|(?P=a_9)x),(?:(?P=a_10)|(?P=a_9)),(?:(?P=a_11)x|(?P=a_10)x),(?:(?P=a_11)|(?P=a_10)),(?:(?P=a_11)),(?:(?P=a_5)x|(?P=a_6)x|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)|(?P=a_12)x),(?:(?P=a_5)|(?P=a_6)x|(?P=a_12)),(?:(?P=a_5)x|(?P=a_6)|(?P=a_12)),(?:(?P=a_13)x|(?P=a_12)x),(?:(?P=a_13)|(?P=a_12)),(?:(?P=a_14)x|(?P=a_13)x),(?:(?P=a_14)|(?P=a_13)),(?:(?P=a_14)),(?:(?P=a_7)x|(?P=a_8)x|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)|(?P=a_15)x),(?:(?P=a_7)|(?P=a_8)x|(?P=a_15)),(?:(?P=a_7)x|(?P=a_8)|(?P=a_15)),(?:(?P=a_16)x|(?P=a_15)x),(?:(?P=a_16)|(?P=a_15)),(?:(?P=a_17)x|(?P=a_16)x),(?:(?P=a_17)|(?P=a_16)),(?:(?P=a_17)),(?:(?P=a_3)x|(?P=a_6)x|(?P=a_18)),(?:(?P=a_3)|(?P=a_18)x),(?:(?P=a_6)|(?P=a_18)x),(?:(?P=a_8)x|(?P=a_18)x|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)|(?P=a_19)x),(?:(?P=a_8)|(?P=a_18)x|(?P=a_19)),(?:(?P=a_8)x|(?P=a_18)|(?P=a_19)),(?:(?P=a_20)x|(?P=a_19)x),(?:(?P=a_20)|(?P=a_19)),(?:(?P=a_20)),(?:(?P=a_21)x|(?P=a_7)x),(?:(?P=a_21)|(?P=a_7)),(?:(?P=a_21)|(?P=a_5)|(?P=a_22)x),(?:(?P=a_21)x|(?P=a_22)),(?:(?P=a_5)x|(?P=a_22)),(?:(?P=a_4)x|(?P=a_22)x|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)|(?P=a_23)x),(?:(?P=a_4)|(?P=a_22)x|(?P=a_23)),(?:(?P=a_4)x|(?P=a_22)|(?P=a_23)),(?:(?P=a_24)x|(?P=a_23)x),(?:(?P=a_24)|(?P=a_23)),(?:(?P=a_24)),(?:(?P=a_3)|(?P=a_7)|(?P=a_25)x),(?:(?P=a_3)x|(?P=a_25)),(?:(?P=a_7)x|(?P=a_25)),(?:(?P=a_6)x|(?P=a_25)x|(?P=a_26)),(?:(?P=a_6)|(?P=a_26)x),(?:(?P=a_25)|(?P=a_26)x),(?:(?P=a_6)x|(?P=a_26)x|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)|(?P=a_27)x),(?:(?P=a_6)|(?P=a_26)x|(?P=a_27)),(?:(?P=a_6)x|(?P=a_26)|(?P=a_27)),(?:(?P=a_28)x|(?P=a_27)x),(?:(?P=a_28)|(?P=a_27)),(?:(?P=a_28)),
SAT
-1 2 -3 4 5 -6 7 -8 9 -10 11 12 -13 14 15 -16 17 -18 -19 20 -21 22 -23 24 25 -26 -27 28

It took ~3 minutes on my old CPU clocked at ~2GHz.

The files, including the "solver" in Python3.

Of course, all this stuff isn't practical at all. But it demonstrates reduction from one problem (regex matching with backreferences) to another (SAT). Find a better algorithm for any of these problem and this would lead to revolution in computer science .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Letting Go of the Words

Letting Go of the Words

Janice (Ginny) Redish / Morgan Kaufmann / 2007-06-11 / USD 49.95

"Redish has done her homework and created a thorough overview of the issues in writing for the Web. Ironically, I must recommend that you read her every word so that you can find out why your customer......一起来看看 《Letting Go of the Words》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试