develooper Front page | perl.fwp | Postings from March 2005

Re: Self-recognizing programs and regular expressions

Thread Previous | Thread Next
From:
Karsten Sperling
Date:
March 8, 2005 11:18
Subject:
Re: Self-recognizing programs and regular expressions
Message ID:
1358707809.20050308201802@stud.uni-karlsruhe.de
Here's one without \Q, which I'm fairly certain is score 1:

(^(?=.*(\\2\?\\))(?=.{338}\)\{2}\.\{11}\\z\z)(?=([^(]*\(){9}.{278}\z))(\2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\\2?\\2\2?\\\2?\?\2?\\\2?\\\2?\)\2?\)\2?\(\2?\?=\2?\.\2?\{338}\2?\\\2?\)\2?\\\2?\{2}\2?\\\2?\.\2?\\\2?\{11}\2?\\\2?\\z\2?\\z\2?\)\2?\(\2?\?=\2?\(\2?\[\2?\^\2?\(\2?\]\2?\*\2?\\\2?\(\2?\)\2?\{9}\2?\.\2?\{278}\2?\\z\2?\)\2?\)\2?\(){2}.{11}\z

The trick is that \2 is the string "\2?\" (i'm not escaping \ with \\
here for clarity), so the regex \2?\* matches "*" as well as "\2?\*".
That gives us a regex that matches either a meta-character or that
same regex matching that meta-character (non-meta-characters aren't a
problem because they match themselves and hence the regex that matches
themselves already).

My first attempt was therefore this (whitespace for clarity):

  (^(?=.*(\\2\?\\)))
  (
    \2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\
    \2?\\2\2?\\\2?\?\2?\\\2?\\\2?\)\2?\)\2?\)
    \2?\(
  ){2}

The regex (^(?=.*(\\2\?\\))) sets up \2, i'll call it R. The next part
is the "escaped" version of R, which i'll call S. S matches the string
R, as well as S itself. So S{2} matches RS. (In fact S also includes
the "(" that comes after R, but thats not important here.)

Unfortunately, S{2} also matches SS, RR, and any other string derived
from SS by removing some or all instances of "\2?\". That's because
these "\2?\" in front of each meta-char is matched optionally.

Using (S){2}.{11}\z makes sure the second match of S goes all the way
through the string to within 11 chars of the end. (11 being the length
of "){2}.{11}\z"). R is extended to verify that these are in fact the
last 11 chars of the string. Additionally, we verify that the string
has the right length: (?=.{338}\)\{2}\.\{11}\\z\z)

It's still possible to shift "\2?\" strings from the second part of
the string to the first though, leaving the length unchanged. To
prevent this, a check is added to R to verify that the opening '(' at
the start of S (the 9th in the final code) is at the right position:
  (?=([^(]*\(){9}.{278}\z)

Here's the whole code again:

my $x = <<'EOF'; $x =~ s/\n//g; print "OK" if $x =~ /$x/;
(^(?=.*(\\2\?\\))(?=.{338}\)\{2}\.\{11}\\z\z)(?=([^(]*\(){9}.{278}\z))
(\2?\(\2?\^\2?\(\2?\?=\2?\.\2?\*\2?\(\2?\\\2?\\2\2?\\\2?\?\2?\\\2?\\\2
?\)\2?\)\2?\(\2?\?=\2?\.\2?\{338}\2?\\\2?\)\2?\\\2?\{2}\2?\\\2?\.\2?\\
\2?\{11}\2?\\\2?\\z\2?\\z\2?\)\2?\(\2?\?=\2?\(\2?\[\2?\^\2?\(\2?\]\2?\
*\2?\\\2?\(\2?\)\2?\{9}\2?\.\2?\{278}\2?\\z\2?\)\2?\)\2?\(){2}.{11}\z
EOF

- Karsten


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About