develooper Front page | perl.perl5.porters | Postings from October 2011

Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
October 13, 2011 05:13
Subject:
Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
CAHhgV8gQLXcRzYMzxwe7CD9WWZBxvwV37WES4vdo=SfsCW+4Kg@mail.gmail.com
On Thu, Sep 29, 2011 at 10:09 AM, Nicholas Clark <nick@ccl4.org> wrote:
>> I would personally prefer it to be one layer with multiple options. I
>> suspect that would be conceptually cleaner when you want to combine
>> them. E.g: Ā«open my $fh, '<':utf8(surrogates-ok,nonchars-ok),
>> $filenameĀ» or some such.
>
> I *think* so, but somewhere I have notes on what made sense, and some
> combinations don't.
>
> Whilst it would be nice for :utf8 to be the flexible layer, I think it would
> lead to various problems, including problems with security expectations.
> Code you wrote on a perl with it would work just fine, nicely locked down.
>
> Then you run that code on an older perl:
>
> $ echo Works already | perl -we 'open my $fh, "<:utf8(surrogates-ok,nonchars-ok)", "/dev/fd/0"; print <$fh>'
> Works already
> $ echo Works already | perl -we 'open my $fh, "<:utf8(maximally-paranoid)", "/dev/fd/0"; print <$fh>'
> Works already
>
> a) No error. No warning that your input isn't subject to paranoia
> b) No way to write a compatibility version that works on older perls.
>
>
> I guess that one can solve (a) by having :utf8 fault the new arguments
> unless in the scope of a suitable :feature, but it's starting to feel
> clunky.
>
>
> Also, in terms of Jesse's 5.16+ plan, I can't see how layers are anything
> but interpreter-global. In that, if we change the default for :utf8, it
> has to be for everyone. The code doing validation can't do it on the basis
> of lexical scope, because validation probably has to happen when a buffer is
> filled, not as data are read out.

If we had aliases in PerlIO, all of this could be handled much more
cleanly. :utf8 would mean :utf8-new or :utf8-old depending on scope.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About