develooper Front page | perl.perl5.porters | Postings from November 2010

Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]

Thread Previous
From:
Nicholas Clark
Date:
November 27, 2010 06:15
Subject:
Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]
Message ID:
20101127141541.GR24189@plum.flirble.org
On Sat, Nov 27, 2010 at 07:01:57AM -0700, Tom Christiansen wrote:
> Nick wrote:
> 
> >>> (WRONG in the general case. It feels like an awful lot of end-user
> >>> code to deal with encodings is heuristics and bodgery, rather than
> >>> actual understanding)
> 
> >> Very true, and a source of perpetual annoyance.  But it's a separate
> >> issue, isn't it?
> 
> > Not in my mind. Finding the need to resort to flipping the internal
> > flag for UTF-8 is a red flag that the proper conversion layer isn't
> > implemented, because the flow of data hasn't been thought about.
> 
> It does leave a code-smell, doesn't it?  I've always been uncomfy
> with it, but I don't know what else to do.  Could you please tell
> me how I *should* then be writing the unless test and block at
> the bottom of this code snippet:

What is this code trying to do? It's not obvious to me.

>     for my $codepoint ( $first_codepoint .. $last_codepoint ) {
> 
>         # gaggy UTF-16 surrogates are invalid UTF-8 code points
>         next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF;
> 
>         # from utf8.c in perl src; must avoid fatals in 5.10
>         next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF;
> 
> 	# both FFFE and FFFF are "not characters" in any plane
>         next if 0xFFFE == ($codepoint & 0xFFFE); 
> 
>         # see "Unicode non-character %s is illegal for interchange" in perldiag(1)
>         $_ = do { no warnings "utf8"; chr($codepoint) };
> 
>         # fixes "the Unicode bug"
>         unless (utf8::is_utf8($_)) {
>             $_ = decode("iso-8859-1", $_);
>         }

And (unless I'm missing something) the code as-is *isn't* flipping the
internal flag, so there's no way it can leave internal structures in an
inconsistent state.

Nicholas Clark

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About