develooper Front page | perl.perl5.porters | Postings from July 2000

use of 'no utf8' within a (??{ ... }) block?

From:
Jeffrey Friedl
Date:
July 31, 2000 11:14
Subject:
use of 'no utf8' within a (??{ ... }) block?
Message ID:
200007310810.BAA22303@ventrue.yahoo.com

I hesitate to call this a bug, since I'm so new to uft8 stuff, but I
though that this program would have printed that there was no match:

    #!/usr/local/bin/perl -w
    use strict;
    use utf8;

    $_ = "A \x{263a} B";  # 263a is a unicode smiley encoded with three bytes

    if (m/A (??{ no utf8;  "." }) B/) {
	print "match [$&]\n";
    } else {
	print "no match\n";
    }


I would have thought that the dot would have been a dot to match a byte
rather than a character, due to the 'no utf8', but the debugging output for
that sub-regex compile shows:

    Compiling REx `.'
    size 2 first at 1
       1: ANYUTF8(2)
       2: END(0)
    minlen 1 

FWIW, it's the same when 'no utf8' is replaced by 'use bytes';


Oddly enough (at least, oddly to me), if you replace the match line with

      if (m/A (??{  "\xE2\x98\xBA" }) B/) {  ## raw "\xE2\x98\xBA" how
					     ## \x{263a} encoded in utf8

it still matches. In this case, the use or not of "no utf8" within the
(??{ ... }) block doesn't matter. I would have expected it to match only if
"no utf8" (or "use bytes") were in the block.

Bug with Perl? Feature? Bug with my understanding?

	Jeffrey



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About