develooper Front page | perl.perl5.porters | Postings from March 2011

unicode string "feature"?

Thread Next
From:
Tom Christiansen
Date:
March 11, 2011 10:09
Subject:
unicode string "feature"?
Message ID:
22252.1299866976@chthon
How come these aren't the same?

    % blead -E 'say "\N{U+DF}" =~ /ss/iu || 0'
    1

    % blead -E 'say    "\xDF"  =~ /ss/iu || 0'
    0

I thought the /u would force Unicode casing semantics
no matter whether the UTF-8 flag is on or not.

Does it somehow have something to do with this?

    % blead -Mbytes -E 'say "\N{U+DF}" =~ /\xDF/ui || 0'
    0

    % blead -Mbytes -E 'say "\xDF" =~ /\N{U+DF}/ui || 0'
    0

Those last two compile to rather different (and surprising-looking) 
regexes.

This is where by use bytes question was coming from. I'm trying to
see when and whether it ever any longer makes any sense to use bytes. 
(We already know my laments about use utf8.)

Thanks.  

--tom

    % blead -Mre=debug -E 'say "\N{U+DF}" =~ /ss/iu || 0'
    Compiling REx "ss"
    Final program:
       1: EXACTFU <ss> (3)
       3: END (0)
    stclass EXACTFU <ss> minlen 2 
    Matching REx "ss" against "%x{df}"
    UTF-8 string...
    Matching stclass EXACTFU <ss> against "%x{df}" (2 bytes)
       0 <> <%x{df}>             |  1:EXACTFU <ss>(3)
       2 <%x{df}> <>             |  3:END(0)
    Match successful!
    1
    Freeing REx: "ss"

    % blead -Mre=debug -E 'say "\xDF" =~ /ss/iu || 0'
    Compiling REx "ss"
    Final program:
       1: EXACTFU <ss> (3)
       3: END (0)
    stclass EXACTFU <ss> minlen 2 
    0
    Freeing REx: "ss"

    % blead -Mre=debug -E 'say "\N{U+DF}\N{U+DF}" =~ /ss/iu || 0'
    Compiling REx "ss"
    Final program:
       1: EXACTFU <ss> (3)
       3: END (0)
    stclass EXACTFU <ss> minlen 2 
    Matching REx "ss" against "%x{df}%x{df}"
    UTF-8 string...
    Matching stclass EXACTFU <ss> against "%x{df}%x{df}" (4 bytes)
       0 <> <%x{df}>             |  1:EXACTFU <ss>(3)
       2 <%x{df}> <%x{df}>       |  3:END(0)
    Match successful!
    1
    Freeing REx: "ss"

    % blead -Mre=debug -E 'say "\xDF\xDF" =~ /ss/iu || 0'
    Compiling REx "ss"
    Final program:
       1: EXACTFU <ss> (3)
       3: END (0)
    stclass EXACTFU <ss> minlen 2 
    Matching REx "ss" against "%x{df}%x{df}"
    Matching stclass EXACTFU <ss> against "%x{df}%x{df}" (2 bytes)
       0 <> <%x{df}>             |  1:EXACTFU <ss>(3)
				      failed...
    Contradicts stclass... [regexec_flags]
    Match failed
    0
    Freeing REx: "ss"

    % blead -Mre=debug -Mbytes -E 'say "\N{U+DF}" =~ /\xDF/ui || 0'
    Compiling REx "\xDF"
    Final program:
       1: ANYOFV{i}[\xdf][{unicode}{outside bitmap}00df] (12)
      12: END (0)
    minlen 0 
    Matching REx "\xDF" against "%x{c3}%x{9f}"
       0 <> <%x{c3}>             |  1:ANYOFV{i}[\xdf][{unicode}{outside bitmap}00df](12)
				      failed...
       1 <%x{c3}> <%x{9f}>       |  1:ANYOFV{i}[\xdf][{unicode}{outside bitmap}\xc3\x9f...00df](12)
				      failed...
       2 <%x{c3}%x{9f}> <>       |  1:ANYOFV{i}[\xdf][{unicode}{outside bitmap}\xc3\x9f...00df](12)
				      failed...
    Match failed
    0
    Freeing REx: "\xDF"

    % blead -Mre=debug -Mbytes -E 'say "\xDF" =~ /\N{U+DF}/ui || 0'
    Compiling REx "\N{U+DF}"
    Final program:
       1: EXACTFU <ss> (3)
       3: END (0)
    stclass EXACTFU <ss> minlen 2 
    0
    Freeing REx: "\N{U+DF}"

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About