Front page | perl.perl5.porters |
Postings from December 2010
so funny it makes you cry
From:
Tom Christiansen
Date:
December 7, 2010 23:03
Subject:
so funny it makes you cry
Message ID:
17013.1291791743@chthon
Karl,
Remember when you put the stuff in to be more careful about
\cX?
Get this...
The Java regex engine blindly xors the X in a \cX character
with 64. It doesn't check whether the result is a control
character. So \cA is Control-A, but \c\x01 is A. And it
even gets weirder as you get higher. Look what it does with
a non-BMP character! These are all true:
PrintStream stdout = new PrintStream(System.out, true, "UTF-8");
String s, r;
s = "\u0001"; r = "A";
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
s = "A"; r = "\u0001";
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
s = ";"; r = "{";
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
s = "\u00A9"; r = "\u00E9";
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
r = "\uFA76"; s = "\uFA36";
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
r = "\uD87E\uDC65"; // U+2F865 in disgustomatic UTF-16
s = "\uD87E\uDC25"; // U+2F825 in disgustomatic UTF-16
stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");
Produces:
<> =~ /\cA/ == true
<A> =~ /\c/ == true
<;> =~ /\c{/ == true
<©> =~ /\cé/ == true
<喝> =~ /\c勇/ == true
<勇> =~ /\c姘/ == true
Isn't that *special*?
I find it repugnant to put that idiocy in my compat-lib.
--tom
-
so funny it makes you cry
by Tom Christiansen