develooper Front page | perl.perl5.porters | Postings from November 2010

Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
November 27, 2010 04:06
Subject:
Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]
Message ID:
20101127120629.GQ24189@plum.flirble.org
On Fri, Nov 26, 2010 at 01:03:20PM -0800, Reverend Chip wrote:
> On 11/26/2010 2:25 AM, Nicholas Clark wrote:
> > On Fri, Nov 26, 2010 at 02:20:40AM -0800, Reverend Chip wrote:
> >> On 11/26/2010 1:23 AM, Nicholas Clark wrote:
> >>> Isn't the bug that perl let someone create an invalid data structure?
> >> That's an internally consistent position (no pun intended).  But does
> >> the utf8 flag truly count as internal if manipulating it is both easy
> >> and well-documented for users?
> > easy (yes, too easy), documented (maybe, not well enough, particularly about
> > what it's about) and WRONG.
> 
> You seriously equate Encode::_utf8_on() with, say, playing around with
> optrees using B?  You seriously equate a bad pointer in an SV to a
> misplaced byte in a utf8 string?

Yes. Totally.

It's documented as

    [INTERNAL] Turns on the UTF8 flag in STRING.  The data in STRING is
    B<not> checked for being well-formed UTF-8.  Do not use unless you
    B<know> that the STRING is well-formed UTF-8.

and the leading underscore is a convention too for "internal use".

I'd really prefer that it didn't exist at all.

> > (WRONG in the general case. It feels like an awful lot of end-user code to
> > deal with encodings is heuristics and bodgery, rather than actual
> > understanding)
> 
> Very true, and a source of perpetual annoyance.  But it's a separate
> issue, isn't it?

Not in my mind. Finding the need to resort to flipping the internal flag for
UTF-8 is a red flag that the proper conversion layer isn't implemented,
because the flow of data hasn't been thought about.

> >> As a separate matter, perhaps we can at least agree that assert() is an
> >> unfriendly thing for Perl to do in this case [...]
> > Where do you stop?
> 
> Well, I wrote "in this case", so we would stop here.  It would be a
> concession to usability based on manipulation of the utf8 flag being
> easy and documented (as you acknowledged).

No, I don't think that we'd stop "here", where "here" is that part of the
regexp engine. To be sure that we don't SEGV or fail assertions anywhere in
the codebase if buffers as marked with SvUTF8() when they are not valid UTF-8,
we'd have to check AT EVERY PLACE that they are what they say they are.

Which I don't think is viable.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About