develooper Front page | perl.perl5.porters | Postings from December 2010

Re: RFC: Summary of proposed handling of surrogates, non-characters,etc for 5.14. Note some backward incompatibility

Thread Previous | Thread Next
From:
demerphq
Date:
December 20, 2010 23:51
Subject:
Re: RFC: Summary of proposed handling of surrogates, non-characters,etc for 5.14. Note some backward incompatibility
Message ID:
AANLkTi=Dq5NTGJNW9KganGGn7kHSq-RS8JSMLbm_FkRt@mail.gmail.com
On 20 December 2010 22:52, Eric Brine <ikegami@adaelis.com> wrote:
> On Mon, Dec 20, 2010 at 1:13 PM, demerphq <demerphq@gmail.com> wrote:
>>
>> And the point is that certain codepoints are illegal in general, and
>>
>> so can be treated as essentially maping to themselves.
>>
>> Others are legal ONLY in UTF16,
>
> You are calling both the encoded form and the decoded form "code point", and
> you are using them interchangeably. I can't respond to your post if you
> don't clear that up.

I dont think I am. Surrogate pairs are codepoints. When they are
interpreted correctly they produce a different codepoint.

>> So for an example. Consider we have the codepoint U+10400 which case
>> folds to U+10428. When represented in UTF-16 the codepoint U+10400
>> ends up as the surrogate pair U+D801,U+DC00. Now, if somebody naively
>> converts this UTF-16 sequence to UTF8 by converting code point by
>> codepoint, the end result will be that OUR code does NOT see codepoint
>> U+10400,
>
> No, the result will be a warning when you decode the bad UTF-8 you produced
> this way. We all agree the decoder should warn (with an option to disable)
> when it sees invalid UTF-8 or Unicode.

What do you mean "no"? Are you saying that we will see the correct
codepoint U+10400? Cause I can assure you that our code will NOT.

Yes I agree we should warn when we write UTF8 that contains surrogate
pair codepoints. However I also think we should warn when we try to
lc() or uc() a string containing them, as we WILL NOT DO IT CORRECTLY.

There is no room for argument. debate, or personal opinion on the
latter assertion. It is a fact.

Cheers,
yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About