develooper Front page | perl.perl5.porters | Postings from February 2003

Re: [perl #21372] AutoReply: utf8 in regex leads to corruption when used with uc($1)

From:
Dominic Mitchell
Date:
February 26, 2003 07:22
Subject:
Re: [perl #21372] AutoReply: utf8 in regex leads to corruption when used with uc($1)
Message ID:
3E5CD9F3.5040701@semantico.com
perlbug-followup@perl.org wrote:
>     #!/usr/bin/perl -w
>     use strict;
>     use warnings;
>     use Encode;
>     # Various bits taken from HTML::Mason::Lexer::match_block().
>     my $blocks_re = qr/once|flags|filter|args|attr|init|shared|perl|text|doc|cleanup/i;
>     my $comp_source = <<'WIBBLE';
>     <%args>
>     $foo
>     </%args>
>     This is a pretend mason component.
>     WIBBLE
>     Encode::_utf8_on( $comp_source );
> 
>     if ( $comp_source =~ /\G<%($blocks_re)>/igcs ) {
> 	print "\$1 is $1\n";
> 	my $type = lc $1;
> 	print "[1] \$type is '$type'\n";
> 	print "[2] \$type is '$type'\n";
>     }

In my hastiness to file this, I forgot to include the output.  This 
output contains the error:

% perl fred.pl
$1 is args
[1] $type is 'a0c0'
[2] $type is 'a0c0'

If you comment out the _utf8_on() call, you instead get this (correct) 
output:

% perl fred.pl
$1 is args
[1] $type is 'args'
[2] $type is 'args'

-Dom

-- 
| Semantico: creators of major online resources          |
|       URL: http://www.semantico.com/                   |
|       Tel: +44 (1273) 722222                           |
|   Address: 33 Bond St., Brighton, Sussex, BN1 1RD, UK. |




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About