develooper Front page | perl.perl5.porters | Postings from July 2000

use of 'no utf8' within a (??{ ... }) block?

Jeffrey Friedl
July 31, 2000 11:14
use of 'no utf8' within a (??{ ... }) block?
Message ID:

I hesitate to call this a bug, since I'm so new to uft8 stuff, but I
though that this program would have printed that there was no match:

    #!/usr/local/bin/perl -w
    use strict;
    use utf8;

    $_ = "A \x{263a} B";  # 263a is a unicode smiley encoded with three bytes

    if (m/A (??{ no utf8;  "." }) B/) {
	print "match [$&]\n";
    } else {
	print "no match\n";

I would have thought that the dot would have been a dot to match a byte
rather than a character, due to the 'no utf8', but the debugging output for
that sub-regex compile shows:

    Compiling REx `.'
    size 2 first at 1
       1: ANYUTF8(2)
       2: END(0)
    minlen 1 

FWIW, it's the same when 'no utf8' is replaced by 'use bytes';

Oddly enough (at least, oddly to me), if you replace the match line with

      if (m/A (??{  "\xE2\x98\xBA" }) B/) {  ## raw "\xE2\x98\xBA" how
					     ## \x{263a} encoded in utf8

it still matches. In this case, the use or not of "no utf8" within the
(??{ ... }) block doesn't matter. I would have expected it to match only if
"no utf8" (or "use bytes") were in the block.

Bug with Perl? Feature? Bug with my understanding?

	Jeffrey Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About