develooper Front page | perl.unicode | Postings from May 2004

Re: BOM and principle of least surprise

Thread Previous | Thread Next
Larry Wall
May 10, 2004 10:02
Re: BOM and principle of least surprise
Message ID:
On Mon, May 10, 2004 at 04:45:55PM +0100, Nick Ing-Simmons wrote:
: Larry Wall <> writes:
: >
: >Right now, the meaning of "text" is subject to severe distortions
: >due to legacy issues.  But in the long run, "text" is going to mean
: >Unicode, and that probably means a UTF-8 file encoding at least in
: >the western world, 
: Microsoft seem to be somewhat focused on some 16-bit form.

Yeah, well, they've never minded if you have to buy a new computer to
run their new software... :-)

: This thread started as complaint that perl5 can't read a 
: script saved as UCS-2/UTF-16 or whatever Windows uses.

That's why I said "probably".  And I probably should have said
"hopefully" instead.  :-)

But my main point was that "text" will eventually mean "Unicode",
whether or not that means "UTF-8".  (I probably should have
parenthesized the two subthoughts about what will end up the default
where.)  Really, though, once you've guaranteed a Unicode view at the
appropriate input boundaries, the differences between the various UTFs
should be fairly insignificant from a language point of view, provided
you maintain the abstractions.  The Perl 5 engine unfortunately doesn't
provide quite enough abstraction power to pull it off.  We're hoping to
do a better job of pulling it off with Perl 6, but that implies a more
strongly typed string implementation underneath than Perl 5 provides.

Perl's always been about providing reasonable defaults, and will
continue to do so.  But changing what's reasonable is tricky, and
sometimes you have to go through a period in which nothing can be
considered reasonable.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About