On Wed, Sep 28, 2011 at 1:09 AM, Karl Williamson <public@khwilliamson.com> wrote: > This issue keeps coming back up, when I think we have long ago resolved how > to fix it. Here is my view of how the API should work, and I thought that > it followed the consensus view. This follows what I think Zefram and David > Golden proposed more than a year ago. > > The default utf8 layer should prohibit malformed utf8, surrogates, > non-character code points and above-Unicode code points. > > There should be an alternate layer, called something like utf8-lax, which > allows all three, but not malformed utf8. There should be three other > layers, with names like no-surrogates, no-nonchars, and only-unicode which > disallow exactly one class, as indicated by their names. It should be then > possible to combine these to orthogonally allow any combination of the three > problematic input types. I would personally prefer it to be one layer with multiple options. I suspect that would be conceptually cleaner when you want to combine them. E.g: «open my $fh, '<':utf8(surrogates-ok,nonchars-ok), $filename» or some such. LeonThread Previous | Thread Next