develooper Front page | perl.perl5.porters | Postings from February 2011

Re: [perl #85034] tring match position on utf8 upgrade

Thread Previous | Thread Next
From:
perl5-porters
Date:
February 28, 2011 14:48
Subject:
Re: [perl #85034] tring match position on utf8 upgrade
Message ID:
ikh8nu$659$19@post.home.lunix
In article <AANLkTimREgEmsxbGN5_Gt+pJk9svLu1u972VQ0JxBZKR@mail.gmail.com>,
	demerphq <demerphq@gmail.com> writes:
>> [Please describe your issue here]
>>
>> perl -wle '$_="\xce" x 20; pos($_) = 12; utf8::upgrade($_); print pos $_'
>> 6
>>
>> This is because the PERL_MAGIC_regex_global value is in bytes even if the
>> string is internally UTF8. If the string gets upgraded this position ought
>> to be recalculated
> 
> Or should it be treated as a character count?
> 
> Yves
> 
byte count is probably more practical so that you immediately know where to 
continue matching even if you lose or don't have the utf8 offset cache.
No utf8 offset cache seems to be pretty normal if you get 
PERL_MAGIC_regex_global due to a //g match instead of explicitely setting 
pos()

perl -wle 'use Devel::Peek; $_=join("\xce", "a" .. "z"); utf8::upgrade($_); /q/g; Dump($_)'

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About