develooper Front page | perl.perl5.porters | Postings from January 2011

Re: refined :utf8 I/O layers proposal

Thread Previous | Thread Next
From:
Jesse Vincent
Date:
January 2, 2011 23:10
Subject:
Re: refined :utf8 I/O layers proposal
Message ID:
20110103070953.GA32143@shihtzu



On Fri 24.Dec'10 at  9:03:05 -0700, karl williamson wrote:
> Based on feedback, here's a revised proposal:

This proposal makes sense to me, though I fully acknowledge that I don't 
understand all the ins and outs of unicode.

> No layer allows in syntactically malformed utf8
> 
> :strict_utf8 allows in only what Unicode says is interchangeable

Does this change from version to version of the Unicode standard? If so,
we may want to explicitly define :strict_utf8_60 :strict_utf8_50 and so
on and explicitly state that :strict_utf8 is always an alias to the 
most current version of the Unicode standard. If this Never Changes(tm),
ignore this suggestion.

> :safe_utf8 (or maybe :portable_utf8) allows the above plus
> above-unicode code points up to those that begin with 0xfe.  It's
> said that 0xfe and 0xff can start looking like utf16, although I
> don't fully understand the  whole thing.  If we accepted 0xfe and
> not 0xff we still wouldn't ever accept a misconstrued BOM; accepting
> 0xfe goes beyond what a U32 can hold, and so is non-portable.
> Another possibility is for this option to accept only up to what a
> U32 can hold.

I tend to shy away from names including the word "safe" as they
invariably describe something that's discovered not to be safe.
Would it be wrong to describe this one simply as ":utf8"?

> :unsafe_utf8 (or :non_portable_utf8) allows in surrogates,
> noncharacter code points, and all above-unicode code points that
> don't overflow the platform's UV.

:unchecked_utf8? (I don't really care as much about the name on this
one)

> :utf8 is aliased to :safe_utf8.  I'm with zefram that the easiest
> thing to do should not allow attack possibilities.
> 
> :no_surrogates prohibits surrogates
> 
> :no_above_unicode prohibits above-unicode code points
> 
> :no_nonchars prohibits non-character code points.
> 
> I believe this gives the orthogonality that xdg wants;  better name
> suggestions welcome

Best,
Jesse

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About