develooper Front page | perl.perl5.porters | Postings from November 2010

Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]

Thread Previous | Thread Next
Tom Christiansen
November 27, 2010 06:03
Re: "perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed." [5.12.1]
Message ID:
Nick wrote:

>>> (WRONG in the general case. It feels like an awful lot of end-user
>>> code to deal with encodings is heuristics and bodgery, rather than
>>> actual understanding)

>> Very true, and a source of perpetual annoyance.  But it's a separate
>> issue, isn't it?

> Not in my mind. Finding the need to resort to flipping the internal
> flag for UTF-8 is a red flag that the proper conversion layer isn't
> implemented, because the flow of data hasn't been thought about.

It does leave a code-smell, doesn't it?  I've always been uncomfy
with it, but I don't know what else to do.  Could you please tell
me how I *should* then be writing the unless test and block at
the bottom of this code snippet:

    for my $codepoint ( $first_codepoint .. $last_codepoint ) {

        # gaggy UTF-16 surrogates are invalid UTF-8 code points
        next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF;

        # from utf8.c in perl src; must avoid fatals in 5.10
        next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF;

	# both FFFE and FFFF are "not characters" in any plane
        next if 0xFFFE == ($codepoint & 0xFFFE); 

        # see "Unicode non-character %s is illegal for interchange" in perldiag(1)
        $_ = do { no warnings "utf8"; chr($codepoint) };

        # fixes "the Unicode bug"
        unless (utf8::is_utf8($_)) {
            $_ = decode("iso-8859-1", $_);

Especially given that this code must run on 5.10 and better, not
just blead, I don't know how else to do it.  "Should" I be calling
pack("U", $codepoint) or something?



Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About