develooper Front page | perl.perl5.porters | Postings from May 2011

Re: [perl #85034] tring match position on utf8 upgrade

Thread Previous
From:
Dave Mitchell
Date:
May 16, 2011 03:10
Subject:
Re: [perl #85034] tring match position on utf8 upgrade
Message ID:
20110516101030.GD2741@iabyn.com
On Wed, Mar 02, 2011 at 06:20:31AM +0100, Aristotle Pagaltzis wrote:
> * Ton Hospel <perl5-porters@ton.iguana.be> [2011-02-28 23:50]:
> > In article <AANLkTimREgEmsxbGN5_Gt+pJk9svLu1u972VQ0JxBZKR@mail.gmail.com>,
> > 	demerphq <demerphq@gmail.com> writes:
> > >> [Please describe your issue here]
> > >>
> > >> perl -wle '$_="\xce" x 20; pos($_) = 12; utf8::upgrade($_); print pos $_'
> > >> 6
> > >>
> > >> This is because the PERL_MAGIC_regex_global value is in
> > >> bytes even if the string is internally UTF8. If the string
> > >> gets upgraded this position ought to be recalculated
> > >
> > > Or should it be treated as a character count?
> > >
> > byte count is probably more practical so that you immediately
> > know where to continue matching even if you lose or don't have
> > the utf8 offset cache. No utf8 offset cache seems to be pretty
> > normal if you get PERL_MAGIC_regex_global due to a //g match
> > instead of explicitely setting pos()
> >
> > perl -wle 'use Devel::Peek; $_=join("\xce", "a" .. "z"); utf8::upgrade($_); /q/g; Dump($_)'
> 
> Ideally there would be a byte offset stored internally but the
> `pos` function would return and expect a character offset. (The
> user should never be exposed to the underlying implementation.)
> That means recalculating the byte offset when up- or downgrading
> a string (which is almost zero extra cost since you have to scan
> it anyway) and doing a char→byte conversion when the user sets it
> using `pos`.

This bug was fixed with commit

75da9d4c616bae3e6791af93d2ced52dc8080f06

-- 
O Unicef Clearasil!
Gibberish and Drivel!
    -- "Bored of the Rings"

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About