develooper Front page | perl.perl5.porters | Postings from September 2012

Re: Length magic should die ([perl #114690] Bleadperl v5.17.3-204-g864329c breaks VPIT/Variable-Magic-0.51.tar.gz)

Thread Previous
Father Chrysostomos via RT
September 26, 2012 17:31
Re: Length magic should die ([perl #114690] Bleadperl v5.17.3-204-g864329c breaks VPIT/Variable-Magic-0.51.tar.gz)
Message ID:
On Wed Sep 26 10:21:50 2012, sprout wrote:
> On Wed Sep 26 10:10:44 2012, sprout wrote:
> > On Wed Sep 26 09:08:38 2012, wrote:
> > > 
> > > > Even if we fix length magic to set the utf8 flag we will still
have a
> > > > problem with get-magic being called too many times in those cases
> where
> > > > length magic falls back to get-magic.  There is no way to tell
> > > > afterwards whether length magic resorted to get-magic or not.
> > > 
> > > One possible solution would be to replace mg_length() by a new 
> > > mg_length_flags() function that would take a 'flags' parameter
> > > it whether it can call 'get' magic if no 'len' magic is available.
> > > not sure it's worth the trouble though.
> > 
> > Probably not.
> > 
> > Anyway, this is worse than I thought:
> > 
> > sv_len calls mg_length on gmagical variables, and returns the byte
> > length otherwise.
> > 
> > mg_length calls length magic if available, and returns the number of
> > *characters* otherwise.
> > 
> > Length magic on match vars (the only scalar length magic) returns the
> > length in bytes.
> > 
> > I’ve locally changed sv_len to stop calling mg_length, but mg_length is
> > an API function, so I cannot just remove it.  But what *should* it do? 
> > It currently returns bytes for $1 but characters for $^A.
> Correction: $^A does have length magic (but Perl_magic_len, which
> handles it, falls back to get-magic and SvCUR), so mg_length returns
> bytes for $^A.  But it returns characters for substr lvalues.
> Also, I have just noticed that when Perl_magic_len falls back to SvCUR,
> it only does so for variables that are SvPOK after sv_2pv, and returns 0
> otherwise.  (That means pos can be set on $_, but not $/, when it
> contains a reference or glob).

The plot thickens.

sv_len_utf8 does not make sense.  It assumes that the string is UTF-8. 
If it is not, it just does the wrong thing.  For magical variables, it
expects mg_length to return the number of characters, but I have already
demonstrated that it does not.  (mg_length, in fact, was ‘fixed’ to return 

Since you have to know already that a string is in utf8 before you can
call sv_len_utf8, but sv_len_utf8 might call get-magic which will change
the utf8-ness, it really makes no sense as an API.  Up till now, it has
been consistently buggy with any magic scalars.

So now I think sv_len_utf8 should be modified to do exactly what the
documentation says: return the number of characters.  This could avoid
the complex dance elsewhere involving DO_UTF8 checks and copies of
magical variables, etc.


Father Chrysostomos

via perlbug:  queue: perl5 status: open

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About