develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous | Thread Next
Nicholas Clark
September 29, 2011 01:09
Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
On Wed, Sep 28, 2011 at 01:43:18AM +0200, Leon Timmermans wrote:

> I would personally prefer it to be one layer with multiple options. I
> suspect that would be conceptually cleaner when you want to combine
> them. E.g: Ā«open my $fh, '<':utf8(surrogates-ok,nonchars-ok),
> $filenameĀ» or some such.

I *think* so, but somewhere I have notes on what made sense, and some
combinations don't.

Whilst it would be nice for :utf8 to be the flexible layer, I think it would
lead to various problems, including problems with security expectations.
Code you wrote on a perl with it would work just fine, nicely locked down.

Then you run that code on an older perl:

$ echo Works already | perl -we 'open my $fh, "<:utf8(surrogates-ok,nonchars-ok)", "/dev/fd/0"; print <$fh>'
Works already
$ echo Works already | perl -we 'open my $fh, "<:utf8(maximally-paranoid)", "/dev/fd/0"; print <$fh>'
Works already

a) No error. No warning that your input isn't subject to paranoia
b) No way to write a compatibility version that works on older perls.

I guess that one can solve (a) by having :utf8 fault the new arguments
unless in the scope of a suitable :feature, but it's starting to feel

Also, in terms of Jesse's 5.16+ plan, I can't see how layers are anything
but interpreter-global. In that, if we change the default for :utf8, it
has to be for everyone. The code doing validation can't do it on the basis
of lexical scope, because validation probably has to happen when a buffer is
filled, not as data are read out.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About