develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
From:
demerphq
Date:
February 27, 2008 01:13
Subject:
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
9b18b3110802270113x1e982c5ag6aff53cfbec81fc8@mail.gmail.com
On 27/02/2008, Glenn Linderman <perl@nevcal.com> wrote:
> On approximately 2/26/2008 6:15 PM, came the following characters from
>
> the keyboard of Juerd Waalboer:
>
> > Glenn Linderman skribis 2008-02-26 15:16 (-0800):
>  >> Perhaps all uses in source code of characters outside of the ASCII range
>  >> should produce warnings in the 5.12, unless there is a pragma to specify
>  >> what locale/encoding.
>  >
>  > Sounds useful, but I personally don't think that just assuming "use
>  > utf8;" by default would be a problem if that would interpret invalid
>  > UTF-8 as latin1. Really, actual latin1 data that happens to also be
>  > valid UTF-8 is immensely rare in my experience. (Counter examples,
>  > anyone?) To further reduce the risk, the fallback could be done per line
>  > or per file, instead of per invalid sequence itself.
>  >
>  > (e.g. utf8::decode($_) for @source_lines;)
>  >
>  > In any case, I think that in 5.12, non-ASCII byte data should either
>  > warn (as you suggest) or be interpreted as utf8 with latin1 fallback
>  > (dmq's suggestion, but applied elsewhere), maybe also with a warning.
>
>
>
> We're in pretty close agreement on this point.  The unfortunate part is
>  that people with different locale's may use character values of 128..255
>  without telling Perl.  When I speak of character values 128..255 I refer
>  not only to \x sequences but also the literal characters in the source file.
>
>
>
>  >> But maybe a replacement for "use encoding" should be implemented
>  >> simultaneously.
>  >
>  > I do not object to this, but I do question whether it's worth the tuits.
>  > Only the actual implementers can judge that.
>
>
>
> We're in total agreement on this.  I think the only practical way
>  forward is Unicode; UTF-8 being one encoding of Unicode.  A bit more
>  support for other Unicode encodings would be nice, but hard to put into
>  one bit, I guess.  So the program(mer) has to keep track of that part.
>
>
>
>  >> Implementing a special version of Perl on EBCDIC seems like a waste of
>  >> programmer productivity...
>  >
>  > Agreed, but again: those who implement things get to decide. It does,
>  > however, sometimes keep me from contributing! I'm glad that perlunitut
>  > and perlunifaq were accepted even though they pay no attention to EBCDIC
>  > at all. (It did delay my work, before I decided to simply ignore the
>  > entire EBCDIC world. I have not received even a single complaint about
>  > that.)
>  >
>  >> just default on EBCDIC platforms to "use encoding(EBCDIC);", decode
>  >> the source (and data) from EBCDIC to UTF-8, and charge onward with
>  >> UTF-8 internally.)
>  >
>  > I was told that it's not that simple, but I forgot why.

Because EBCIDIC doesnt use UTF-8 It uses UTF-EBCDIC, which isnt
strictly part of unicode, but thats IBM for you.

The rest of your and Juerds mails are just too long for me to review
in detail, sorry.

Cheers,
Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About