develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
September 27, 2011 16:43
Subject:
Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
CAHhgV8j4_Dm2K3D-vyu5vHirGM2w5XipJTR+20oUG01hpsnNxw@mail.gmail.com
On Wed, Sep 28, 2011 at 1:09 AM, Karl Williamson
<public@khwilliamson.com> wrote:
> This issue keeps coming back up, when I think we have long ago resolved how
> to fix it.  Here is my view of how the API should work, and I thought that
> it followed the consensus view.  This follows what I think Zefram and David
> Golden proposed more than a year ago.
>
> The default utf8 layer should prohibit malformed utf8, surrogates,
> non-character code points and above-Unicode code points.
>
> There should be an alternate layer, called something like utf8-lax, which
> allows all three, but not malformed utf8.  There should be three other
> layers, with names like no-surrogates, no-nonchars, and only-unicode which
> disallow exactly one class, as indicated by their names.  It should be then
> possible to combine these to orthogonally allow any combination of the three
> problematic input types.

I would personally prefer it to be one layer with multiple options. I
suspect that would be conceptually cleaner when you want to combine
them. E.g: «open my $fh, '<':utf8(surrogates-ok,nonchars-ok),
$filename» or some such.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About