develooper Front page | perl.perl5.porters | Postings from December 2010

Re: refined :utf8 I/O layers proposal

Thread Previous | Thread Next
Eric Brine
December 25, 2010 14:07
Re: refined :utf8 I/O layers proposal
Message ID:
On Sat, Dec 25, 2010 at 9:17 AM, David Golden <> wrote:

> On  Karl's proposal, I agree with Zefram that it's headed in the right
> direction.  Let's say that we call perl's internal encoding
> "encoding(int72)" for the sake of argument below.  Then we have two
> encodings:
>  :encoding(UTF-8)
>  :encoding(int72)

There's really three. UTF-8 for interchange, UTF-8 for intrachange and

> A question regarding "safety" -- I believe one of the big safety
> issues is that UTF-8 must always encode/decode to the shortest
> possible sequence. [...] Would we want :encoding(int72_raw)
 as a means of allowing non-shortest sequences?

I don't see any benefit, and they are lots of downsides. For example, "eq"
doesn't recognize different encodings of the same character. At the very
least, we should officially not support longer than minimal encodings. But I
don't think that's enough, especially given how easy overly long encodings
are to detect. Any instances of the following bytes indicates an overly long


As such, I recommend we do something about them. Options:

   - warn and let it through (yuck!)
   - warn and substitute in U+FFFD
   - warn and recode
   - recode (no warning)

- Eric

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About