develooper Front page | perl.perl5.porters | Postings from October 2011

Re: [perl #95160] Unicode readdir bugs

Thread Previous | Thread Next
From:
Brian Fraser
Date:
October 23, 2011 20:59
Subject:
Re: [perl #95160] Unicode readdir bugs
Message ID:
CA+nL+nYhRYOeXf-8m276960KZWgEK6gj-7MFmiEycxnPEAr_=A@mail.gmail.com
On Sun, Oct 23, 2011 at 11:44 PM, Father Chrysostomos via RT <
perlbug-followup@perl.org> wrote:

> On Sun Oct 23 18:26:45 2011, Hugmeir wrote:
> > There's a couple of things here being grouped as one. Ignoring
> > require/use/do for a moment, most of those functions already have bug
> > reports on them because, let me quote tchrist here,
> >
> > *Who* told Perl it was ok to let me blithely use wide characters in
> > > creat but then forbad me from using them in readdir? That's stupid.
> > > Perl should forbid unencoded wide characters in syscalls. It already
> > > does in syswrite.
> >
> >
> > So, first thing: Be like syswrite. -All- syscalls, sans for
> > say/print/printf/warn/die which already have exceptions, should croak if
> > passed non-downgradeable scalars.
>
> (Please, don’t put -deable at the end of a Latin-based word. :-) It’s
> ‘downgradable’.)
>

But I like my half-broken english..! Fine :P


>
> syswrite seems to be the odd one out.  It’s probably using SvPVbyte.
> print, die, and warn just warn (i.e., warn chr 256 produces two
> warnings).  It’s a default warning, though.
>

That's true, but consider which one of those has the actually useful
behavior. How many times have you gotten a "Wide character" warning that
left you with mostly worthless output, and had to rerun things by adding the
layers?

Also, how often do you actually want to pass the internal form of UTF-8 to
system calls? I'm not saying it can't happen, but it's certainly not the
common use case. On nearly every other occasion it's a bug that Perl isn't
reporting, and a warning in this case is twice as useless.


> With the new pragma, I would suggest fixing the Unicode bug for those
> functions when the pragma is off (with a warning and fallback).  If that
> causes CPAN breakage, then the new behaviour should be enabled with ‘use
>
>
I don't think it wouldn't cause any more breakage than when the Fcntl
constants subs became actual ()-prototyped constants. The only things that
"broke" were already broken, but Perl wasn't reporting it.

(I'd have little qualms if this were triggered by a 'use VERSION;' though)

>
> > Second, there should be a way to avoid doing an encode/decode on every
> > syscall. Since I haven't read the Python thread yet I can't say much on
> > this, but for a while I've had a open-like pragma for this in mind, eg
> >
> > use syscalls IN => ":encoding(...)", OUT => ":encoding(...)";
> >
> > or
> >
> > use syscalls :dir => { IN => ":encoding(...)", OUT => ":encoding(...)" }
> >
> > Or somesuch, which won't solve problems in, say, Windows, but hopefully
> it
> > won't make them any worse.
>
> I think it would make things worse, as we would have yet another
> non-portable interface that is unusable as a result.  In this case it’s
> not even portable between Unix systems, because it cannot be used
> correctly on Mac OS X, which forces file names on *all* Unix interfaces
> to be in UTF-8.
>
> On the other hand we could provide it with lots of caveats in the
> documentation.  Maybe it could be part of the same pragma.
>
>
Um, I'm not sure I follow. Isn't it as portable as the encode/decode calls
that you are forced to use right now? If so yeah, that's pretty bad, but you
can abstract that with something like

use PerlIO::fse;
use syscalls :all => ":fse";


> > Then you could implement unicode::filenames as a
> > wrapper around that, and if you want to grab that layer from a locale
> > setting, that's entirely up to you (just don't ask me to debug it later).
> >
> > Third, require/use/do. I recall Python having some problems with this (if
> > the thread that I've neglected reading touches this, I apologize) -- And
> > actually, I don't know any language that supports it without issues,
> though
> > pointers are of course welcome.
> > Zefram had a great idea for this a while ago -- If a module has Unicode
> in
> > its path, it should get an alias, reachable through some escaping
> scheme or
> > another. So if I had a module Eeyup::\x{30cb}::Bothersome, Bothersome.pm
> > would be reachable through Eeyup/\x{30cb}/, and, failing that,
> > unialias/Eeyup/130cb/
> >
> > Here's the nicest thing -- I implemented 1 and a prototype of 2 in a
> couple
> > of hours, so it's certainly doable, though I haven't touched that in a
> while
> > because I can't figure out a way to test 2 portably.
>
> It sounds like a nice idea at first, but I worry about modules
> ‘disappearing’ depending on what pragma is enabled.
>
>
I was thinking in terms of redefining how the core itself looks for the
modules, that is, change pp_require and friends. If it's implemented as
pragmata, then your worries are spot-on and that could certainly be
troublesome.
More boilerplate for the boilerplate god?


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About