develooper Front page | perl.perl5.porters | Postings from September 2012

Length magic should die ([perl #114690] Bleadperl v5.17.3-204-g864329c breaks VPIT/Variable-Magic-0.51.tar.gz)

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
September 25, 2012 16:04
Subject:
Length magic should die ([perl #114690] Bleadperl v5.17.3-204-g864329c breaks VPIT/Variable-Magic-0.51.tar.gz)
Message ID:
rt-3.6.HEAD-11172-1348614257-1961.114690-15-0@perl.org
On Tue Sep 11 21:40:39 2012, sprout wrote:
> On Tue Sep 11 20:36:54 2012, sprout wrote:
> > On Mon Sep 10 15:31:28 2012, perl@profvince.com wrote:
> > > The test that started to fail with this commit does a substr() on an
> > > unicode string that has get and len magic callbacks. Before the change
> > > len magic was called, now it isn't anymore. Is this is the expected
> > > behaviour after this change? I have no idea whatsoever on what is
> > > supposed to be correct, but I'd like to double check so that nothing
> > > bad
> > > goes unnoticed.
> > 
> > I don’t fully understand len magic.  I think its purpose it to optimise
> > the retrieval of the length, so whether it does or does not get called
> > is likely to change willy-nilly over time.
> 
> I have just had a look at mg_length.  It falls back to get-magic
> (indirectly via SvPV_const) in the absence of length magic.  So it seems
> to me it is never correct for both mg_get and mg_length both to be
> called for the same operation.
> 
> So you can go ahead change your tests.
> 
> In fact, since the mg_length may turn on the UTF8 flag, yet it does not
> provide the pv, most uses of mg_length are probably wrong.

Actually, I am probably going to yank most uses of length magic from the
core, so you might want to hold off on making a release, since I suspect
you’ll have to adjust other tests, too.

Length magic is used both on arrays and on scalars.

The only use on scalars in the perl core is on match vars.

Perl_magic_setpos does this:

    len = SvPOK_nog(lsv) ? SvCUR(lsv) : sv_len(lsv);

and shortly thereafter:

    if (DO_UTF8(lsv)) {
	ulen = sv_len_utf8_nomg(lsv);
	if (ulen)
	    len = ulen;
    }

But length magic on regexp vars does not set the utf8 flag.  So we end
up with discrepancies like this:

$ perl -le '"\x{100}a" =~ /(..)/; pos($1) = 2; print pos($1); "$1";
print pos($1)'
2
1

And nonsense like this:

$ perl -le '"\x{100}a" =~ /(.)/; pos($1) = 2; print pos($1); "$1"; print
pos($1)'
1
Malformed UTF-8 character (unexpected end of string) in match position
at -e line 1.
0

Even if we fix length magic to set the utf8 flag we will still have a
problem with get-magic being called too many times in those cases where
length magic falls back to get-magic.  There is no way to tell
afterwards whether length magic resorted to get-magic or not.

So I don’t see how length magic on scalars has ever worked since the
advent of utf8.

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=114690

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About