develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
Juerd Waalboer
March 30, 2007 14:26
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
Marvin Humphrey skribis 2007-03-30 14:00 (-0700):
> >Perl does not have strong typing.
> If it is so deadly to collide byte-oriented data with character data,  
> it should not be so easy to do so accidentally.

I agree. But Perl chose to have the same single data type for all
strings, and to maintain compatibility with older Perls by assuming that
your byte string is a latin1 string if you start using it as a text
string. After all, in a strictly 8 bit world, there's no need for a
distinction, so people were never careful about it.

(Well, there was a need, but ignorance being bliss ignoring that was
better for anyone's sanity.)

It kind of bothers me that people constantly whine about this decision
years after it was made. The time to influence the decision has past. It
just seems so counter-productive to keep bringing it up, while there are
bugs to be discovered and fixed.

I wasn't active in p5p back then, and if I had been, I would probably
not have overseen the consequences, just like the porters then didn't.
But wonderfully, a rather consistent and usable plus useful model was
invented, with better/easier Unicode/encodings support than any other
programming language. Of course it's never good enough, but let's first
focus on finding and fixing bugs.

> That so many users, including those as expert as Marc, possess a 
> "broken" understanding of Perl's Unicode model suggests a flawed
> design.

I think the design is solid, but the implementation (see regex) slightly
broken and documentation wildly misleading.

The documentation thing I'm trying to fix with perlunitut, perlunifaq,
and a lot of changes to existing documentation, all of which are now
part of bleadperl and will probably be part of the next Perl release.

In addition, I'm maintaining a consise list of best practices at, and spending tuits on teaching people
(including module maintainers) about the One Way To Do It, because there
is, in fact, just one way that really works well in this case. You just
have to find it, and stick to it. TIMTOWTDI doesn't always apply.

> We have been set up to fail.

Maybe so, but you haven't given up yet, and I hope you won't. Please
join us in the effort to deal with the problems at hand. It's a hell of
a lot more productive than praying for the opportunity to undo recent
years of Perl.

Surely you must know a way in which Perl's unicode support can be
improved, or accidents avoided, without trying to change all of Perl,
CPAN, and a gazillion lines of code that we can't even reach. Let's hear
it! :)

korajn salutojn,

  juerd waalboer:  perl hacker  <>  <>
  convolution:     ict solutions and consultancy <>

Ik vertrouw stemcomputers niet.
Zie <>.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About