On Wed Sep 26 10:21:50 2012, sprout wrote: > On Wed Sep 26 10:10:44 2012, sprout wrote: > > On Wed Sep 26 09:08:38 2012, perl@profvince.com wrote: > > > > > > > Even if we fix length magic to set the utf8 flag we will still have a > > > > problem with get-magic being called too many times in those cases > where > > > > length magic falls back to get-magic. There is no way to tell > > > > afterwards whether length magic resorted to get-magic or not. > > > > > > One possible solution would be to replace mg_length() by a new > > > mg_length_flags() function that would take a 'flags' parameter telling > > > it whether it can call 'get' magic if no 'len' magic is available. I'm > > > not sure it's worth the trouble though. > > > > Probably not. > > > > Anyway, this is worse than I thought: > > > > sv_len calls mg_length on gmagical variables, and returns the byte > > length otherwise. > > > > mg_length calls length magic if available, and returns the number of > > *characters* otherwise. > > > > Length magic on match vars (the only scalar length magic) returns the > > length in bytes. > > > > I’ve locally changed sv_len to stop calling mg_length, but mg_length is > > an API function, so I cannot just remove it. But what *should* it do? > > It currently returns bytes for $1 but characters for $^A. > > Correction: $^A does have length magic (but Perl_magic_len, which > handles it, falls back to get-magic and SvCUR), so mg_length returns > bytes for $^A. But it returns characters for substr lvalues. > > Also, I have just noticed that when Perl_magic_len falls back to SvCUR, > it only does so for variables that are SvPOK after sv_2pv, and returns 0 > otherwise. (That means pos can be set on $_, but not $/, when it > contains a reference or glob). The plot thickens. sv_len_utf8 does not make sense. It assumes that the string is UTF-8. If it is not, it just does the wrong thing. For magical variables, it expects mg_length to return the number of characters, but I have already demonstrated that it does not. (mg_length, in fact, was ‘fixed’ to return Since you have to know already that a string is in utf8 before you can call sv_len_utf8, but sv_len_utf8 might call get-magic which will change the utf8-ness, it really makes no sense as an API. Up till now, it has been consistently buggy with any magic scalars. So now I think sv_len_utf8 should be modified to do exactly what the documentation says: return the number of characters. This could avoid the complex dance elsewhere involving DO_UTF8 checks and copies of magical variables, etc. -- Father Chrysostomos --- via perlbug: queue: perl5 status: open https://rt.perl.org:443/rt3/Ticket/Display.html?id=114690Thread Previous