develooper Front page | perl.perl5.porters | Postings from August 2001

Re: UTF-8 bugs in string length & single line regex matches

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
August 4, 2001 08:18
Subject:
Re: UTF-8 bugs in string length & single line regex matches
Message ID:
20010804101847.F16234@chaos.wustl.edu
On Fri, Aug 03, 2001 at 11:39:33AM +0100, Daniel P. Berrange wrote:
> I'm in the process of converting my employeer's perl applications
> to use UTF-8 throughout and have come across a couple of
> interesting bugs when working with UTF-8 strings and perl 5.7.2.
> 
> The first is in the Perl_mg_length function, which causes the 
> string length to be reported in bytes rather than characters, 
> even though the UTF-8 flag is set. I've attached a patch 
> (against 5.7.2) containing a fix & new test case for t/op/length.t

Thanks, applied (as patch #11572, see
http://public.activestate.com/cgi-bin/perlbrowse
)
 
> The second, in the regex engine, causes '.' to match against
> bytes rather than characters when using the /s operator for 
> the regex match. I thought I had a suitable patch, unfortunately
> it merely succeeded in breaking \C instead :-( I've attached
> it anyway as it may help someone else develop a proper patch
> for this problem. Also attached a script to demo the problem.

Will investigate, thanks for the demo script.  (The \C is Evil.)

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About