Front page | perl.perl5.porters |
Postings from October 2011
[perl #95160] Unicode readdir bugs
Thread Previous
|
Thread Next
From:
Father Chrysostomos via RT
Date:
October 23, 2011 21:25
Subject:
[perl #95160] Unicode readdir bugs
Message ID:
rt-3.6.HEAD-31297-1319430338-1697.95160-15-0@perl.org
On Sun Oct 23 21:00:09 2011, Hugmeir wrote:
> On Sun, Oct 23, 2011 at 11:44 PM, Father Chrysostomos via RT <
> perlbug-followup@perl.org> wrote:
>
> > On Sun Oct 23 18:26:45 2011, Hugmeir wrote:
> > (Please, don’t put -deable at the end of a Latin-based word. :-)
> It’s
> > ‘downgradable’.)
> >
>
> But I like my half-broken english..! Fine :P
Please don’t think I’m trying to pick on you. I just see this misuse so
often I thought maybe mentioning it once would give others a hint, too.
Generally, only the consonants c g k m v m z can have -eable after them,
but there are exceptions.
(You don’t know how long I’ve been wanting to bring this up--but now I’m
*way* off topic.)
>
>
> >
> > syswrite seems to be the odd one out. It’s probably using SvPVbyte.
> > print, die, and warn just warn (i.e., warn chr 256 produces two
> > warnings). It’s a default warning, though.
> >
>
> That's true, but consider which one of those has the actually useful
> behavior. How many times have you gotten a "Wide character" warning
> that
> left you with mostly worthless output, and had to rerun things by
> adding the
> layers?
Several hundred. But those were one-time one-liners.
> Also, how often do you actually want to pass the internal form of UTF-
> 8 to
> system calls? I'm not saying it can't happen, but it's certainly not
> the
> common use case. On nearly every other occasion it's a bug that Perl
> isn't
> reporting, and a warning in this case is twice as useless.
I think we need to warn, for backward-compatibility. I know there have
been times that I relied on UTF-8 interfaces accepting Unicode strings,
without even realising what I was doing. My code worked, after all.
Then module upgrades broke things, but only every tenth time or so that
the code ran, so it remained buggy a long time.
> > With the new pragma, I would suggest fixing the Unicode bug for
> those
> > functions when the pragma is off (with a warning and fallback). If
> that
> > causes CPAN breakage, then the new behaviour should be enabled with
> ‘use
> >
> >
> I don't think it wouldn't cause any more breakage than when the Fcntl
> constants subs became actual ()-prototyped constants. The only things
> that
> "broke" were already broken, but Perl wasn't reporting it.
That’s my thought, but actual smoke reports tend to sway me quickly.
> (I'd have little qualms if this were triggered by a 'use VERSION;'
> though)
>
> >
> > > Second, there should be a way to avoid doing an encode/decode on
> every
> > > syscall. Since I haven't read the Python thread yet I can't say
> much on
> > > this, but for a while I've had a open-like pragma for this in
> mind, eg
> > >
> > > use syscalls IN => ":encoding(...)", OUT => ":encoding(...)";
> > >
> > > or
> > >
> > > use syscalls :dir => { IN => ":encoding(...)", OUT =>
> ":encoding(...)" }
> > >
> > > Or somesuch, which won't solve problems in, say, Windows, but
> hopefully
> > it
> > > won't make them any worse.
> >
> > I think it would make things worse, as we would have yet another
> > non-portable interface that is unusable as a result. In this case
> it’s
> > not even portable between Unix systems, because it cannot be used
> > correctly on Mac OS X, which forces file names on *all* Unix
> interfaces
> > to be in UTF-8.
> >
> > On the other hand we could provide it with lots of caveats in the
> > documentation. Maybe it could be part of the same pragma.
> >
> >
> Um, I'm not sure I follow. Isn't it as portable as the encode/decode
> calls
> that you are forced to use right now? If so yeah, that's pretty bad,
> but you
> can abstract that with something like
>
> use PerlIO::fse;
> use syscalls :all => ":fse";
The whole point of the unicode::filenames pragma is to eliminate the
need to have to specify encodings everywhere, at least as I envision it.
After all, Windows, VMS and Mac OS X all have character sequences for
file names. I think some FreeBSDs might, too, but I’m not sure. So
your explicit encoding suggestion just seems like a can of worms to me,
which will doubtless be misused in CPAN modules by those who don’t
really understand the issues.
> > > Then you could implement unicode::filenames as a
> > > wrapper around that, and if you want to grab that layer from a
> locale
> > > setting, that's entirely up to you (just don't ask me to debug it
> later).
> > >
> > > Third, require/use/do. I recall Python having some problems with
> this (if
> > > the thread that I've neglected reading touches this, I apologize)
> -- And
> > > actually, I don't know any language that supports it without
> issues,
> > though
> > > pointers are of course welcome.
> > > Zefram had a great idea for this a while ago -- If a module has
> Unicode
> > in
> > > its path, it should get an alias, reachable through some escaping
> > scheme or
> > > another. So if I had a module Eeyup::\x{30cb}::Bothersome,
> Bothersome.pm
> > > would be reachable through Eeyup/\x{30cb}/, and, failing that,
> > > unialias/Eeyup/130cb/
> > >
> > > Here's the nicest thing -- I implemented 1 and a prototype of 2 in
> a
> > couple
> > > of hours, so it's certainly doable, though I haven't touched that
> in a
> > while
> > > because I can't figure out a way to test 2 portably.
> >
> > It sounds like a nice idea at first, but I worry about modules
> > ‘disappearing’ depending on what pragma is enabled.
> >
> >
> I was thinking in terms of redefining how the core itself looks for
> the
> modules, that is, change pp_require and friends. If it's implemented
> as
> pragmata, then your worries are spot-on and that could certainly be
> troublesome.
My initial train of thought was a little muddled. In any case, if perl
is to make multiple attempts to load the file, using different methods,
ignoring any pragmata, then that concern is irrelevant. But how many
attempts should perl be making?
If some OSes use Aristotle’s approach, then we only need *two* attempts,
and Zefram’s plan, although it would have been wonderful if 5.8 had
implemented it, will have to be discarded.
There are already people using ‘use Mödule’ on OS X. We shouldn’t break
their code.
> More boilerplate for the boilerplate god?
???
Thread Previous
|
Thread Next