 Front page | perl.fwp | Postings from March 2005

## Re: Self-recognizing programs and regular expressions

From:
Karsten Sperling
Date:
March 8, 2005 11:18
Subject:
Re: Self-recognizing programs and regular expressions
Message ID:
1358707809.20050308201802@stud.uni-karlsruhe.de
```Here's one without \Q, which I'm fairly certain is score 1:

(^(?=.*(\\2\?\\))(?=.{338}\)\{2}\.\{11}\\z\z)(?=([^(]*\(){9}.{278}\z))(\2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\\2?\\2\2?\\\2?\?\2?\\\2?\\\2?\)\2?\)\2?\(\2?\?=\2?\.\2?\{338}\2?\\\2?\)\2?\\\2?\{2}\2?\\\2?\.\2?\\\2?\{11}\2?\\\2?\\z\2?\\z\2?\)\2?\(\2?\?=\2?\(\2?\[\2?\^\2?\(\2?\]\2?\*\2?\\\2?\(\2?\)\2?\{9}\2?\.\2?\{278}\2?\\z\2?\)\2?\)\2?\(){2}.{11}\z

The trick is that \2 is the string "\2?\" (i'm not escaping \ with \\
here for clarity), so the regex \2?\* matches "*" as well as "\2?\*".
That gives us a regex that matches either a meta-character or that
same regex matching that meta-character (non-meta-characters aren't a
problem because they match themselves and hence the regex that matches

My first attempt was therefore this (whitespace for clarity):

(^(?=.*(\\2\?\\)))
(
\2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\
\2?\\2\2?\\\2?\?\2?\\\2?\\\2?\)\2?\)\2?\)
\2?\(
){2}

The regex (^(?=.*(\\2\?\\))) sets up \2, i'll call it R. The next part
is the "escaped" version of R, which i'll call S. S matches the string
R, as well as S itself. So S{2} matches RS. (In fact S also includes
the "(" that comes after R, but thats not important here.)

Unfortunately, S{2} also matches SS, RR, and any other string derived
from SS by removing some or all instances of "\2?\". That's because
these "\2?\" in front of each meta-char is matched optionally.

Using (S){2}.{11}\z makes sure the second match of S goes all the way
through the string to within 11 chars of the end. (11 being the length
of "){2}.{11}\z"). R is extended to verify that these are in fact the
last 11 chars of the string. Additionally, we verify that the string
has the right length: (?=.{338}\)\{2}\.\{11}\\z\z)

It's still possible to shift "\2?\" strings from the second part of
the string to the first though, leaving the length unchanged. To
prevent this, a check is added to R to verify that the opening '(' at
the start of S (the 9th in the final code) is at the right position:
(?=([^(]*\(){9}.{278}\z)

Here's the whole code again:

my \$x = <<'EOF'; \$x =~ s/\n//g; print "OK" if \$x =~ /\$x/;
(^(?=.*(\\2\?\\))(?=.{338}\)\{2}\.\{11}\\z\z)(?=([^(]*\(){9}.{278}\z))
(\2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\\2?\\2\2?\\\2?\?\2?\\\2?\\\2
?\)\2?\)\2?\(\2?\?=\2?\.\2?\{338}\2?\\\2?\)\2?\\\2?\{2}\2?\\\2?\.\2?\\
\2?\{11}\2?\\\2?\\z\2?\\z\2?\)\2?\(\2?\?=\2?\(\2?\[\2?\^\2?\(\2?\]\2?\
*\2?\\\2?\(\2?\)\2?\{9}\2?\.\2?\{278}\2?\\z\2?\)\2?\)\2?\(){2}.{11}\z
EOF

- Karsten

```