develooper Front page | perl.perl5.porters | Postings from November 2000

Re: [ID 20001114.001] use utf8;use charnames; is incorrect for \x{80}-\x{FF}

Thread Previous | Thread Next
Nick Ing-Simmons
November 14, 2000 10:27
Re: [ID 20001114.001] use utf8;use charnames; is incorrect for \x{80}-\x{FF}
Message ID:
Andrew McNaughton <> writes:
>I'm rather concerned by what's happening with the utf-8 implementation. By
>trying to modify existing functions, while retaining compatibility with
>existing code, the semantics are getting muddled, 

_were_ muddled back in perl5.6.0 - a lot of discussion and patches have 
happened since then.

>and I expect this to
>lead to a host of security problems.  It is important that utf-8 text
>should be cleanly utf-8. as soon as character sequences which are not
>valid utf-8 start being processed by utf-8 text handlers, the ambiguities
>will lead to a great many validation and security issues.  
>I do understand
>that this is difficult territory, but the only way to get through is with
>a clean and consistent data model.  

We think we have one now - or rather (for backward compatibility and efficiency
reasons) two :
  A. iso8859-1 bytes 0..255
  B. UTF-8 encoded UNICODE characters.

But those are _supposed_ to be only exposed to the C code of the internals
and XS modules. Perl code sees sequences of characters.

It gets messy when IO gets involved - and sorting that out is what I should
be doing rather than getting all defensive here ... 

>In my view introducing a perl specific
>text encoding scheme which behaves like utf-8 sometimes, but not at other
>times is a serious mistake.

Possibly, but the need to graft UNICODE support into perl5 without breaking
existing iso8859-1/binary bashing applications forces the issue.

Nick Ing-Simmons

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About