develooper Front page | perl.perl5.porters | Postings from December 2010

so funny it makes you cry

From:
Tom Christiansen
Date:
December 7, 2010 23:03
Subject:
so funny it makes you cry
Message ID:
17013.1291791743@chthon
Karl, 

Remember when you put the stuff in to be more careful about
\cX? 

Get this...

The Java regex engine blindly xors the X in a \cX character
with 64.  It doesn't check whether the result is a control 
character.  So \cA is Control-A, but \c\x01 is A.  And it
even gets weirder as you get higher.  Look what it does with 
a non-BMP character!  These are all true:

    PrintStream stdout = new PrintStream(System.out, true, "UTF-8");
    String s, r;

    s = "\u0001"; r = "A";
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

    s = "A"; r = "\u0001"; 
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

    s = ";"; r = "{";
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

    s = "\u00A9"; r = "\u00E9";
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

    r = "\uFA76"; s = "\uFA36";
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

    r = "\uD87E\uDC65";  // U+2F865 in disgustomatic UTF-16
    s = "\uD87E\uDC25";  // U+2F825 in disgustomatic UTF-16
    stdout.printf("<%s> =~ /%s/ == %s\n", s, "\\c"+r,
	Pattern.compile("\\c"+r).matcher(s).find() ? "true" : "false");

Produces:

    <> =~ /\cA/ == true
    <A> =~ /\c/ == true
    <;> =~ /\c{/ == true
    <©> =~ /\cé/ == true
    <喝> =~ /\c勇/ == true
    <勇> =~ /\c姘/ == true

Isn't that *special*? 

I find it repugnant to put that idiocy in my compat-lib.

--tom



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About