In article <AANLkTimREgEmsxbGN5_Gt+pJk9svLu1u972VQ0JxBZKR@mail.gmail.com>, demerphq <demerphq@gmail.com> writes: >> [Please describe your issue here] >> >> perl -wle '$_="\xce" x 20; pos($_) = 12; utf8::upgrade($_); print pos $_' >> 6 >> >> This is because the PERL_MAGIC_regex_global value is in bytes even if the >> string is internally UTF8. If the string gets upgraded this position ought >> to be recalculated > > Or should it be treated as a character count? > > Yves > byte count is probably more practical so that you immediately know where to continue matching even if you lose or don't have the utf8 offset cache. No utf8 offset cache seems to be pretty normal if you get PERL_MAGIC_regex_global due to a //g match instead of explicitely setting pos() perl -wle 'use Devel::Peek; $_=join("\xce", "a" .. "z"); utf8::upgrade($_); /q/g; Dump($_)'Thread Previous | Thread Next