develooper Front page | perl.perl5.porters | Postings from May 2010

Unicode <DATA> and -CLADS and such

Thread Next
From:
Tom Christiansen
Date:
May 15, 2010 12:04
Subject:
Unicode <DATA> and -CLADS and such
Message ID:
24975.1273950258@chthon
Am I correct that the -C command-line switch has no effect on the DATA
handle(s)?  If so, this seems to be undocumented--but I have a wisp of a
memory of a semi-recent discussion about it.  Was there?

I came across the issue in a program running under various
PERL_UNICODE settings and using this to find its data:

    $input_fh = (@ARGV || !-t STDIN) ? *ARGV : *DATA;

That behaves rather differently depending on whether you are
really using stdin, or a file, or the DATA handle.  However:

    * You can't run with C<perl -CLADS> or whatnot to "fix" it.
      This seems counter the obvious reading of perlrun(1).

    * C<use open> has no effect.

    * C<use utf8> works.

    * C<use encoding ":utf8"> merely appears to work, but
     actually breaks the rest of your program.  WHAT?!

Run the included program to see what I mean.  Tsk. :(

By the way, one might remind oneself that

    BEGIN { binmode(DATA, ":utf8") }  

is useless because the compiler has no access to DATA.  
One must "of course" write that

    INIT  { binmode(DATA, ":utf8") } 

so that it runs in the interpeter instead.  I mention this latter because I
recall having been chidden here for my habit of writing INIT{}s rather than
BEGIN{}s, a habit I got into because of the compile-once/run-many model of
mod_perl and others.  *Not* everything that needs initted should be done in
a BEGIN: compiler-fu, sure; program-fu, no.

--tom

----------------- cut here and break your monitor ------------------
use 5.12.0;
use strict;
use warnings;

# for a good time, try each of these...
#use open IO  => ":utf8";  # no effect
#use encoding "utf8";      # "works" but brokenly overrides explicit binmode!
#use utf8;                 # works, and still respects explicit binmode

print "\${^UNICODE} == ${^UNICODE}\n" 
     if ${^UNICODE};            # -Cbits have no effect!

our $data_starts_at = tell DATA;
our $NR = () = <DATA>;

read_data("units");  # default units

print "\nShould be variable width:";
binmode(DATA, ":raw")  || die;
read_data("bytes");

print "\nShould be all-8s width:";
binmode(DATA, ":utf8") || die;
read_data("chars");

sub read_data { 
    my $kind = shift();
    seek(DATA, $data_starts_at, 0);
    print "\n";
    while (<DATA>) {
        chomp;
        printf "Line %2d is %2d %s long.\n",
            $.%$NR||$NR, length, $kind;
    } 
}

__END__
6  00036
Ʊ  001B1
ᴨ  01D28
𐅀  10140

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About