develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
March 30, 2007 11:34
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
aa5b09f00703301133s2fdb3810w9fb90a75be8cfe67@mail.gmail.com
> Could you tell me why almost every other 5.6 bug was fixed in 5.8, but
> gratitious breakage of large parts of CPAN are accepted with this change?
> Whats the rationale behind keeping this 5.6 bug, while fixing the rest?
>
> So why not fix it? Nobody made such a fuss when they fixed the remaining bugs
> from 5.6.

Oh, for heavens sake.  I'm sorry but I have VERY hard time of
listening to your wailing and sitting still.  So I won't.

Perl 5.8 was in development for quite close to two years (5.7.0 in
2000-Sep, but work started already in July or so - 5.8.0 in 2002-Jul),
and 5.8.1 (the "cleanup for oopses" for 5.8.0) took another year.  So
three years before we had really a useable 5.8.

Since then Nicholas picked up and has admirably and thanklessly
released SEVEN maintenance releases of 5.8 over three years, meaning
that about every six months there has been a change of fixing
something that is very broken.

How serious a breakage can be if in three years of development and
three years of maintenance it hasn't gotten enough attention to be
fixed?  There is no hidden conspiracy of keeping things broken.

I'm the first one to admit that I wasn't brave enough to REALLY fix
the Unicode brokenness of 5.6.

(1) The more strongly typed scheme, where there would be really
forcibly separate "byte strings" and "Unicode strings" *would* have
been possible, if I only had had the guts.  But it was mostly the
regex engine that scared me too much.  For basic strings manipulation
and I/O it would not have been a problem to implement.

(2) Another big mistake (due to lack of courage) was the decision to
stick with "Latin-1" as the  default 8-bit legacy "type".  I should
have broken that assumption, too, and stuck with pure ASCII (or
EBCDIC).  (As a side thing, the 8-bit locale support should have been
ejected, too: it is just not worth the trouble: it should have been
replaced with something pluggable so that people could have plugged in
CLDR or Windows locales or whatever they want.)

I'm just getting really, really tired of people whining about Perl's Unicode.

-- 
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About