develooper Front page | perl.perl5.porters | Postings from December 2010

refined :utf8 I/O layers proposal

Thread Next
karl williamson
December 24, 2010 08:03
refined :utf8 I/O layers proposal
Message ID:
Based on feedback, here's a revised proposal:

No layer allows in syntactically malformed utf8

:strict_utf8 allows in only what Unicode says is interchangeable

:safe_utf8 (or maybe :portable_utf8) allows the above plus above-unicode 
code points up to those that begin with 0xfe.  It's said that 0xfe and 
0xff can start looking like utf16, although I don't fully understand the 
  whole thing.  If we accepted 0xfe and not 0xff we still wouldn't ever 
accept a misconstrued BOM; accepting 0xfe goes beyond what a U32 can 
hold, and so is non-portable.  Another possibility is for this option to 
accept only up to what a U32 can hold.

:unsafe_utf8 (or :non_portable_utf8) allows in surrogates, noncharacter 
code points, and all above-unicode code points that don't overflow the 
platform's UV.

:utf8 is aliased to :safe_utf8.  I'm with zefram that the easiest thing 
to do should not allow attack possibilities.

:no_surrogates prohibits surrogates

:no_above_unicode prohibits above-unicode code points

:no_nonchars prohibits non-character code points.

I believe this gives the orthogonality that xdg wants;  better name 
suggestions welcome

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About