develooper Front page | perl.perl5.porters | Postings from May 2010

Re: [perl #75000] Unicode symbols damaged in $File::Find::name

Thread Previous
From:
Dave Mitchell
Date:
May 10, 2010 10:09
Subject:
Re: [perl #75000] Unicode symbols damaged in $File::Find::name
Message ID:
20100510170427.GL26313@iabyn.com
On Sun, May 09, 2010 at 11:57:36AM -0700, Vladimir Morozov wrote:
> when executed following code
>  find(sub {                                                
>                 return if -d $File::Find::name;                       
>                 return if ! /$suffixes$/;                             
>                 my $name=$File::Find::name;                           
>                 print 'File: ';
>                 print $_;                                             
>                 print ' Path: ';
>                 print $name;                                             
>             }, $directory);
> with folder containing files named with non-latin characters the output of '$name' contains damaged unicode characters.
> If $directory also contains non-latin characters only file names are damaged ($directory part is correct)

This is a general issue with filenames, and not just restricted to
File::Find. For example the following shows that the returned filename
string isn't UTF-8 encoded:

    my $f = "file\x{100}";
    open my $fh, '>', $f or die "open: $!\n";
    close $fh;

    my ($newf) = <file*>;
    use Devel::Peek;
    Dump $f;
    Dump $newf;

A workaround (if you know that the filenames are UTF8 encoded) is to
UTF-8 decode the returned filename before using it, e.g.:

    my $name = $_;
    utf8::decode($name);

I notice that perltodo.pod has this entry:

    =head2 Unicode and glob()

    Currently glob patterns and filenames returned from File::Glob::glob()
    are always byte strings.  See L</"Virtualize operating system access">.

and perlrun.pod has this entry:

    =item B<-C [I<number/list>]>
    ...
    =for todo
    perltodo mentions Unicode in %ENV and filenames. I guess that these will be
    options e and f (or F).


-- 
This is a great day for France!
    -- Nixon at Charles De Gaulle's funeral

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About