develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
From:
Rafael Garcia-Suarez
Date:
February 21, 2008 02:34
Subject:
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
b77c1dce0802210234o2141a7f5q69845119b5d19471@mail.gmail.com
Juerd Waalboer wrote:
> encoding.pm has a broken design, and for that reason, any fix will
>  probably break almost all existing code using it.
>
>  Unfortunately, it applies \x escapes 00..ff before it decodes the source.
>  This means that for 8bit encodings, you can only use characters in the
>  latin1 range if the same character happens to be in the 0..255 range for
>  your chosen encoding. E.g. with "use encoding 'koi8r';" it is no longer
>  possible to have a literal é (U+00e9, eacute), not even with chr().
>
>  Because there are other problems with encoding.pm, that can also not be
>  fixed without breaking backward compatibility, I suggest the following
>  simple 4 step plan for the future, that is backwards compatible:
>
>  0. keep encoding.pm and ${^ENCODING} (the actual problem) broken
>  1. deprecate encoding.pm; complain loudly with a mandatory warning
>  2. do the same for ${^ENCODING}
>  3. advocate the use of utf8 and "use utf8" for non-latin1 source code
>  4. strongly discourage the use of non-latin1 non-utf8 source code
>  5. modify open.pm to provide a way to set *only* STDIN and STDOUT

I agree with that plan. However, we have some useful code out there, and
in the core, that uses ${^ENCODING}: notably the encoding::warnings
module. That's why fixing the calling steps of ${^ENCODING} methods
might be worthwhile after all. So, no deprecation of ${^ENCODING} for
now.

As for using "exotic" encodings in source code, encoding::source (on
CPAN) eliminates some of the flaws of encoding.pm, but not this one,
that comes from the implementation of ${^ENCODING} in the core, and
how/when its methods are called.

-- 
It is always possible to aglutenate multiple separate problems into a
single complex interdependent solution. In most cases this is a bad idea.
    -- RFC 1925

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About