Front page | perl.perl5.porters |
Postings from July 2011
[perl #95160] Unicode readdir bugs
Thread Next
From:
tchrist1
Date:
July 19, 2011 11:39
Subject:
[perl #95160] Unicode readdir bugs
Message ID:
rt-3.6.HEAD-30268-1311100744-1564.95160-75-0@perl.org
# New Ticket Created by tchrist1
# Please include the string: [perl #95160]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=95160 >
I'm really rather unhappy with the what you see isn't
what you get approach Perl is taking here.
Consider this:
#!/usr/bin/env perl
use v5.12;
use utf8;
use strict;
use autodie;
use warnings;
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
END { close STDOUT }
my @στιγματα = qw( ΣΤΙΓΜΑΣ στιγμασ στιγμας );
for my $στιγμα (@στιγματα) {
my $fh;
open $fh, "> :utf8", $στιγμα;
say $fh "στιγμα";
close $fh;
}
opendir(my $dh, ".");
while (readdir($dh)) {
say if /\P{ASCII}/;
}
closedir($dh);
Run on Linux, I get this nonsense:
ÏÏιγμαÏ
ÏÏιγμαÏ
ΣΤÎÎÎÎΣ
Run on Darwin, I get this, which is even worse:
ÏÏιγμαÏ
ΣΤÎÎÎÎΣ
*Who* told Perl it was ok to let me blithely use wide characters in
creat but then forbad me from using them in readdir? That's stupid.
Perl should forbid unencoded wide characters in syscalls. It already
does in syswrite. Why not here?
Yes, if I make my loop
while (my $enc = readdir($dh)) {
use Encode qw(decode);
$_ = decode "UTF-8", $enc;
say if /\P{ASCII}/;
}
Then I get
στιγμας
στιγμασ
ΣΤΙΓΜΑΣ
on Linux and
στιγμας
ΣΤΙΓΜΑΣ
on Darwin.
But that's nutty, and in several ways.
First off, Darwin's case-insensitive filesytem is an idiot, and doesn't
work correctly. Notice how it not doing casefolding correctly. It
let me create two files that are casefolds of each other, even though
all three are such.
But secondly and of greater importance, I should be able to
do something like:
binmode($dh, ":utf8");
or even
opendir(my $dh, ":utf8", ".");
And not have to deal with this really really stupid encoding business.
Is there reason that this is not a bug that should be fixed?
And don't even get me started about glob(). It's broken, too.
Have fun with HFS+'s quasi-NFD filesystem, eh?
--tom
Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
Platform:
osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
uname='openbsd chthon 4.4 generic#0 i386 '
config_args='-des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='3.3.5 (propolice)', gccosandvers='openbsd4.4'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-lgdbm -lm -lutil -lc
perllibs=-lm -lutil -lc
libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC -L/usr/local/lib -fstack-protector'
Characteristics of this binary (from libperl):
Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
USE_PERL_ATOF
Built under openbsd
Compiled at Jun 11 2011 11:48:28
%ENV:
PERL_UNICODE="SA"
@INC:
/usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
/usr/local/lib/perl5/site_perl/5.14.0
/usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
/usr/local/lib/perl5/5.14.0
/usr/local/lib/perl5/site_perl/5.12.3
/usr/local/lib/perl5/site_perl/5.11.3
/usr/local/lib/perl5/site_perl/5.10.1
/usr/local/lib/perl5/site_perl/5.10.0
/usr/local/lib/perl5/site_perl/5.8.7
/usr/local/lib/perl5/site_perl/5.8.0
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl/5.005
/usr/local/lib/perl5/site_perl
.
Thread Next
-
[perl #95160] Unicode readdir bugs
by tchrist1