develooper Front page | perl.perl5.porters | Postings from September 2011

[perl #95160] Unicode readdir bugs

Thread Next
Father Chrysostomos via RT
September 18, 2011 17:40
[perl #95160] Unicode readdir bugs
Message ID:
On Tue Jul 19 11:39:04 2011, tom christiansen wrote:
> I'm really rather unhappy with the what you see isn't
> what you get approach Perl is taking here.
> Consider this:
>     #!/usr/bin/env perl
>     use v5.12;
>     use utf8;
>     use strict;
>     use autodie;
>     use warnings;
>     binmode(STDOUT, ":utf8");
>     binmode(STDERR, ":utf8");
>     END { close STDOUT  }
>     my @στιγματα = qw( ΣΤΙΓΜΑΣ στιγμασ στιγμας );
>     for my $στιγμα (@στιγματα) {
>         my $fh;
>         open $fh, "> :utf8", $στιγμα;
>         say $fh "στιγμα";
>         close $fh;
>     }
>     opendir(my $dh, ".");
>     while (readdir($dh)) {
>         say if /\P{ASCII}/;
>     }
>     closedir($dh);
> Run on Linux, I get this nonsense:
>     στιγμας
>     στιγμασ
> Run on Darwin, I get this, which is even worse:
>     στιγμας
> *Who* told Perl it was ok to let me blithely use wide characters in
> creat but then forbad me from using them in readdir?  That's stupid.
> Perl should forbid unencoded wide characters in syscalls.  It already
> does in syswrite.  Why not here?

Almost all (if not all?) Perl functions that take file names have this
problem.  They all ignore the UTF8 flag.

I would suggest we use a ‘Wide character’ warning, as we have for print
and warn.

Then we also need a pragma to enable Unicode filenames in -e, open,
readdir, chdir, etc.

What should we call it?

What do we do on systems on which file names *are* just octet sequences
and nothing more? Make loading the pragma die? Make it warn? Do nothing?

Also, what about systems that support Unicode, but for which no one has
had the time to implement this?  (I’m not going to do VMS, for instance.)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About