develooper Front page | perl.perl5.porters | Postings from October 2011

[perl #95160] Unicode readdir bugs

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
October 23, 2011 18:44
Subject:
[perl #95160] Unicode readdir bugs
Message ID:
rt-3.6.HEAD-31297-1319420666-1803.95160-15-0@perl.org
On Sun Oct 23 18:26:45 2011, Hugmeir wrote:
> There's a couple of things here being grouped as one. Ignoring
> require/use/do for a moment, most of those functions already have bug
> reports on them because, let me quote tchrist here,
> 
> *Who* told Perl it was ok to let me blithely use wide characters in
> > creat but then forbad me from using them in readdir? That's stupid.
> > Perl should forbid unencoded wide characters in syscalls. It already
> > does in syswrite.
> 
> 
> So, first thing: Be like syswrite. -All- syscalls, sans for
> say/print/printf/warn/die which already have exceptions, should croak if
> passed non-downgradeable scalars.

(Please, don’t put -deable at the end of a Latin-based word. :-) It’s
‘downgradable’.)

syswrite seems to be the odd one out.  It’s probably using SvPVbyte. 
print, die, and warn just warn (i.e., warn chr 256 produces two
warnings).  It’s a default warning, though.

With the new pragma, I would suggest fixing the Unicode bug for those
functions when the pragma is off (with a warning and fallback).  If that
causes CPAN breakage, then the new behaviour should be enabled with ‘use 

> 
> Second, there should be a way to avoid doing an encode/decode on every
> syscall. Since I haven't read the Python thread yet I can't say much on
> this, but for a while I've had a open-like pragma for this in mind, eg
> 
> use syscalls IN => ":encoding(...)", OUT => ":encoding(...)";
> 
> or
> 
> use syscalls :dir => { IN => ":encoding(...)", OUT => ":encoding(...)" }
> 
> Or somesuch, which won't solve problems in, say, Windows, but hopefully it
> won't make them any worse.

I think it would make things worse, as we would have yet another
non-portable interface that is unusable as a result.  In this case it’s
not even portable between Unix systems, because it cannot be used
correctly on Mac OS X, which forces file names on *all* Unix interfaces
to be in UTF-8.

On the other hand we could provide it with lots of caveats in the
documentation.  Maybe it could be part of the same pragma.

> Then you could implement unicode::filenames as a
> wrapper around that, and if you want to grab that layer from a locale
> setting, that's entirely up to you (just don't ask me to debug it later).
> 
> Third, require/use/do. I recall Python having some problems with this (if
> the thread that I've neglected reading touches this, I apologize) -- And
> actually, I don't know any language that supports it without issues,
though
> pointers are of course welcome.
> Zefram had a great idea for this a while ago -- If a module has Unicode in
> its path, it should get an alias, reachable through some escaping
scheme or
> another. So if I had a module Eeyup::\x{30cb}::Bothersome, Bothersome.pm
> would be reachable through Eeyup/\x{30cb}/, and, failing that,
> unialias/Eeyup/130cb/
> 
> Here's the nicest thing -- I implemented 1 and a prototype of 2 in a
couple
> of hours, so it's certainly doable, though I haven't touched that in a
while
> because I can't figure out a way to test 2 portably.

It sounds like a nice idea at first, but I worry about modules
‘disappearing’ depending on what pragma is enabled.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About