Front page | perl.perl5.porters |
Postings from October 2011
[perl #95160] Unicode readdir bugs
Thread Previous
|
Thread Next
From:
Father Chrysostomos via RT
Date:
October 23, 2011 18:44
Subject:
[perl #95160] Unicode readdir bugs
Message ID:
rt-3.6.HEAD-31297-1319420666-1803.95160-15-0@perl.org
On Sun Oct 23 18:26:45 2011, Hugmeir wrote:
> There's a couple of things here being grouped as one. Ignoring
> require/use/do for a moment, most of those functions already have bug
> reports on them because, let me quote tchrist here,
>
> *Who* told Perl it was ok to let me blithely use wide characters in
> > creat but then forbad me from using them in readdir? That's stupid.
> > Perl should forbid unencoded wide characters in syscalls. It already
> > does in syswrite.
>
>
> So, first thing: Be like syswrite. -All- syscalls, sans for
> say/print/printf/warn/die which already have exceptions, should croak if
> passed non-downgradeable scalars.
(Please, don’t put -deable at the end of a Latin-based word. :-) It’s
‘downgradable’.)
syswrite seems to be the odd one out. It’s probably using SvPVbyte.
print, die, and warn just warn (i.e., warn chr 256 produces two
warnings). It’s a default warning, though.
With the new pragma, I would suggest fixing the Unicode bug for those
functions when the pragma is off (with a warning and fallback). If that
causes CPAN breakage, then the new behaviour should be enabled with ‘use
>
> Second, there should be a way to avoid doing an encode/decode on every
> syscall. Since I haven't read the Python thread yet I can't say much on
> this, but for a while I've had a open-like pragma for this in mind, eg
>
> use syscalls IN => ":encoding(...)", OUT => ":encoding(...)";
>
> or
>
> use syscalls :dir => { IN => ":encoding(...)", OUT => ":encoding(...)" }
>
> Or somesuch, which won't solve problems in, say, Windows, but hopefully it
> won't make them any worse.
I think it would make things worse, as we would have yet another
non-portable interface that is unusable as a result. In this case it’s
not even portable between Unix systems, because it cannot be used
correctly on Mac OS X, which forces file names on *all* Unix interfaces
to be in UTF-8.
On the other hand we could provide it with lots of caveats in the
documentation. Maybe it could be part of the same pragma.
> Then you could implement unicode::filenames as a
> wrapper around that, and if you want to grab that layer from a locale
> setting, that's entirely up to you (just don't ask me to debug it later).
>
> Third, require/use/do. I recall Python having some problems with this (if
> the thread that I've neglected reading touches this, I apologize) -- And
> actually, I don't know any language that supports it without issues,
though
> pointers are of course welcome.
> Zefram had a great idea for this a while ago -- If a module has Unicode in
> its path, it should get an alias, reachable through some escaping
scheme or
> another. So if I had a module Eeyup::\x{30cb}::Bothersome, Bothersome.pm
> would be reachable through Eeyup/\x{30cb}/, and, failing that,
> unialias/Eeyup/130cb/
>
> Here's the nicest thing -- I implemented 1 and a prototype of 2 in a
couple
> of hours, so it's certainly doable, though I haven't touched that in a
while
> because I can't figure out a way to test 2 portably.
It sounds like a nice idea at first, but I worry about modules
‘disappearing’ depending on what pragma is enabled.
Thread Previous
|
Thread Next