develooper Front page | perl.perl5.porters | Postings from February 2011

Re: [perl #85034] tring match position on utf8 upgrade

Thread Previous | Thread Next
February 28, 2011 14:48
Re: [perl #85034] tring match position on utf8 upgrade
Message ID:
In article <>,
	demerphq <> writes:
>> [Please describe your issue here]
>> perl -wle '$_="\xce" x 20; pos($_) = 12; utf8::upgrade($_); print pos $_'
>> 6
>> This is because the PERL_MAGIC_regex_global value is in bytes even if the
>> string is internally UTF8. If the string gets upgraded this position ought
>> to be recalculated
> Or should it be treated as a character count?
> Yves
byte count is probably more practical so that you immediately know where to 
continue matching even if you lose or don't have the utf8 offset cache.
No utf8 offset cache seems to be pretty normal if you get 
PERL_MAGIC_regex_global due to a //g match instead of explicitely setting 

perl -wle 'use Devel::Peek; $_=join("\xce", "a" .. "z"); utf8::upgrade($_); /q/g; Dump($_)'

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About