develooper Front page | perl.perl5.porters | Postings from May 2013

Re: SV_CHECK_THINKFIRST() needs to be rethought and further COWwoes....

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
May 9, 2013 17:18
Subject:
Re: SV_CHECK_THINKFIRST() needs to be rethought and further COWwoes....
Message ID:
20130509171818.GL3729@plum.flirble.org
On Thu, May 09, 2013 at 10:02:44AM -0700, Father Chrysostomos wrote:
> 
> On May 6, 2013, at 4:56 AM, Dave Mitchell <davem@iabyn.com> wrote:
> 
> > On Mon, Mar 25, 2013 at 01:20:32PM +0100, demerphq wrote:
> >> On 25 March 2013 13:13, Dave Mitchell <davem@iabyn.com> wrote:
> >>> On Mon, Mar 25, 2013 at 10:14:18AM +0100, demerphq wrote:
> >>>> To make things worse btw, IMO we have a real issue with COW, in that
> >>>> a) COW strings no longer have the READONLY flag set, code that wants
> >>>> to do an inplace modification based on this flag not being set will
> >>>> silently start modifying COW strings.
> >>>> 
> >>>> Lastly, I thought COW was disabled, but I see lots of code related to
> >>>> it. Devel::Peek still reports keys() as being COW, etc.
> >>>> 
> >>>> I think we have a real mess on our hands here. I feel really pessimistic.
> >>> 
> >>> My (vague) understanding is that although general COW strings have been
> >>> disabled, the pre-existing  shared hash key mechanism is still there, but
> >>> rather than using the FAKE/READONLY flag combination, it uses the new IsCOW
> >>> flag to indicate the sharedness
> >> 
> >> Thats my point. With FAKE+READONLY code like HTML::Entities breaks in
> >> an annoying but from the internals point of view harmless way. IOW, it
> >> refuses to corrupt the PV that holds the key.
> >> 
> >> Now on Perl5.16 the "annoying" part is gone, but as the SV is no
> >> longer READONLY something like HTML::Entities will think it is normal,
> >> and then modify the PV. Assuming modifying the PV doesnt cause a SEGV
> >> something like HTML::Entities will silently corrupt the PL_strtab (the
> >> master hash table) and any hash that mentions that key prior to it
> >> being corrupted. From that point on Perl is in undefined state. The
> >> master string table will contain a HEK that contains a string which
> >> does not match the hash value for the string, and at that point almost
> >> anything could happen.
> >> 
> >> IOW, FAKE+READONLY still told code "be careful, here be dragons"
> >> whereas "COW" doesn't say this at all.
> > 
> > So, the question is what we do for 5.18? I'm not really comfortable enough
> > with understanding how CPAN modules expect and handle FAKE+READONLY to
> > know whether we should revert this change for 5.18 (or even how easy it is
> > to revert). Or whether we should do both: have the isCOW flag, but also
> > set FAKE+READONLY? Or perhaps do both for hash keys, but leave alone
> > in the (disabled) proper string COW code?
> > 
> > FC, do you have any thoughts?
> 
> Originally, SvREADONLY meant that a scalar was read-only, that $x=3 would croak.  When shared hash key scalars came along, this bit gained a new meaning, that the string buffer is read-only, even if the scalar itself is modifiable.  The result was that many XS modules ended up being subtly wrong.
> 
> I did an audit of XS modules using SvREADONLY (which, admittedly, may not have been as exhaustive as I had hoped).  In most cases, XS modules were erroneously rejecting scalars which (from Perl space at least) were not read-only.  By stopping COW from using that flag, I automatically fixed those.  Of the remaining modules, most were buggy either way and would need to be fixed regardless of whether perl used SvREADONLY for COW.

Yes, but arguably it also changes the failure mode from "right-side" to
"wrong-side". In that, before, XS code that was unaware of the 5.8.0
change would fail without corrupting internal state. Whereas now, XS code
that is unaware will write to memory it should not, and failure will be
subtle, and potentially much later and in unrelated code.

XS code that uses C API calls such as sv_setpv(), set_pvn() and similar
will always work. The problem is that there's this massive grey area about
how much of the internal representation XS code is permitted to rely on,
which makes it really difficult to do any sort of significant rearranging
of the internals, because a lot of code on CPAN makes a lot of assumptions
about what it's legal to do, and those assumptions have remained unchanged/
unchallenged in over a decade.

In turn this comes back to the basic problem that there never was an "API"
designed. XS has "happened" as a side effect of authors looking at the
internals and copying bits that worked. And those internals make a lot of
direct data structure accesses, rather than vectoring through function calls,
which means that

1) the internal data structures are exposed
2) it's hard to impossible to intercept accesses and create compatibility
   veneers


I'm not anything is wrong or right here. I'm saying that anything is going
to have to be a compromise, based on guesswork, judgement calls and possibly
even luck.


> So, in short, I think HTML::Entities needs to be corrected.
> 
> We need to encourage people to use SvPV_force or sv_force_normal before assuming that a PV is modifiable.

Completely agree.


Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About