develooper Front page | perl.perl5.porters | Postings from July 2011

[perl #95160] Unicode readdir bugs

Thread Next
From:
tchrist1
Date:
July 19, 2011 11:39
Subject:
[perl #95160] Unicode readdir bugs
Message ID:
rt-3.6.HEAD-30268-1311100744-1564.95160-75-0@perl.org
# New Ticket Created by  tchrist1 
# Please include the string:  [perl #95160]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=95160 >


I'm really rather unhappy with the what you see isn't 
what you get approach Perl is taking here.

Consider this:

    #!/usr/bin/env perl
    use v5.12;
    use utf8;
    use strict;
    use autodie;
    use warnings;
    binmode(STDOUT, ":utf8");
    binmode(STDERR, ":utf8");
    END { close STDOUT  }
    my @στιγματα = qw( ΣΤΙΓΜΑΣ στιγμασ στιγμας );
    for my $στιγμα (@στιγματα) {
        my $fh;
        open $fh, "> :utf8", $στιγμα;
        say $fh "στιγμα";
        close $fh;
    }
    opendir(my $dh, ".");
    while (readdir($dh)) {
        say if /\P{ASCII}/;
    }
    closedir($dh);

Run on Linux, I get this nonsense:

    στιγμας
    στιγμασ
    ΣΤΙΓΜΑΣ

Run on Darwin, I get this, which is even worse:

    στιγμας
    ΣΤΙΓΜΑΣ

*Who* told Perl it was ok to let me blithely use wide characters in
creat but then forbad me from using them in readdir?  That's stupid.
Perl should forbid unencoded wide characters in syscalls.  It already
does in syswrite.  Why not here?

Yes, if I make my loop 

    while (my $enc = readdir($dh)) {
        use Encode qw(decode);
        $_ = decode "UTF-8", $enc;
        say if /\P{ASCII}/;
    }

Then I get 

    στιγμας
    στιγμασ
    ΣΤΙΓΜΑΣ

on Linux and

    στιγμας
    ΣΤΙΓΜΑΣ

on Darwin.

But that's nutty, and in several ways.

First off, Darwin's case-insensitive filesytem is an idiot, and doesn't
work correctly.  Notice how it not doing casefolding correctly.  It
let me create two files that are casefolds of each other, even though
all three are such.

But secondly and of greater importance, I should be able to 
do something like:

    binmode($dh, ":utf8");

or even 

    opendir(my $dh, ":utf8", ".");

And not have to deal with this really really stupid encoding business.

Is there reason that this is not a bug that should be fixed?

And don't even get me started about glob().  It's broken, too.
Have fun with HFS+'s quasi-NFD filesystem, eh?

--tom

Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
   
  Platform:
    osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
    uname='openbsd chthon 4.4 generic#0 i386 '
    config_args='-des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (propolice)', gccosandvers='openbsd4.4'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-lgdbm -lm -lutil -lc
    perllibs=-lm -lutil -lc
    libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC  -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
                        USE_PERL_ATOF
  Built under openbsd
  Compiled at Jun 11 2011 11:48:28
  %ENV:
    PERL_UNICODE="SA"
  @INC:
    /usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/site_perl/5.14.0
    /usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/5.14.0
    /usr/local/lib/perl5/site_perl/5.12.3
    /usr/local/lib/perl5/site_perl/5.11.3
    /usr/local/lib/perl5/site_perl/5.10.1
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.0
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl/5.005
    /usr/local/lib/perl5/site_perl
    .


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About