develooper Front page | perl.perl5.porters | Postings from April 2007

On @rebus=<<`HIC`:Latin1 (Was: OK, but what about me?)

From:
Tom Christiansen
Date:
April 23, 2007 09:30
Subject:
On @rebus=<<`HIC`:Latin1 (Was: OK, but what about me?)
Message ID:
30633.1177345785@chthon
> "Dr.Ruud" <rvtol+news@isolution.nl> writes:

>> Brain fart:
>>
>>     my $foo = <<'FOO' :koi8-r;
>> raw koi8-r data here
>> FOO

> Brilliant, except for the complication that the end-of-heredoc string
> must be encoded in the foreign encoding (which I think is a solvable
> problem).

Do you mean that "FOO" would have to be in koi8-r rather that in the
script's own encoding, if any?

Yes, I agree it looks nifty; it's nice that it should use the existing 
though little-used :attributes syntactic slot.  But I think it leads
to some curious questions.

First of all, one wonders whether this wouldn't lead to more general,
per-literal encoding specs, such as C<'string':enc>, C<q!string!:enc>, 
(or even C<q:string:enc> skipping the dup colon?), and similar ilk?

Even without :enc being applicable to general literals, though, what 
about interpolated data from C< <<"FOO" >?  That is, would something 
like C<"str1 $var str2":euc-tw>, written heredockishly as  

	my $foo = <<"FOO" :euc-tw;
    str1 $var str2
    FOO

mean:  (line endings aside)

     decode(euc_tw => 'str1 ') 
   . $var 
   . decode(euc_tw => ' str2') 

Or would it instead mean:

     decode euc_tw => ('str1 ' . $var . ' str2')

Which goes first?

Mmm, doesn't this mean we'd get to specify an encoding on readpipe? 
I think it does!

	my $rebus = <<`HIC`:Latin1;
    cmd1 $var | $cmd2
    cmd3
    HIC

Yum!  :-)

You know, that's almost even somewhat appealing--compared with 
the alternative: 

    my $rebus = do { 
	open(my $rdpipe, "|- :encoding(Latin1)", "cmd1 $var | cmd2; cmd3 |");
	local $/;
	<$rdpipe>; 
    };

Although certainly the simpler 

    my $rebus = `cmd1 $var | cmd2; cmd3` :Latin1;

or, if you must, 

    my $rebus = qx(cmd1 $var | cmd2; cmd3) :Latin1;

would be easier on the eye and mind than C< <<`HIC`:Latin1 > would.

Hm, looking at the command-interpolated version, it now seems pretty
obvious that variable interpolation must occur before "de-"encoding
(er, "en-"decoding? I just can't keep those two straight in my head!), 
so that would mean 

     my $rebus = decode Latin1 => qx(cmd1 $var | cmd2; cmd3);

So I guess that clears up the order of operations on the prospective
C< <<"HIC":Latin1 > case, doesn't it? 

	my $rebus = <<"HIC" :Latin1;
    str1 $var str2
    HIC

would be

     my $rebus = decode Latin1 => "str1 $var str2";

Hm...

	my @rebus = <<`HIC` :Latin1;
    str1 $var str2
    HIC

In Latin1, there's no trouble, but I'd have to unwrap that to 
see when the implicit line-breaking split ran.  

    my @rebus = split( /(?=\n)/, decode(Latin1 => `str1 $var str2`) );

I wonder a little about other line terminators in very funky encodings.
Let's say Jis0212-RAW had \v stuff far beyond \n.  Would

	my @lines = <<`FOO` :jis0212-raw;
    str1 $var str2
    FOO

be therefore

    my @lines = split( /(?=\n)/, decode(jis0212_raw => `str1 $var str2`) );

Hm, looks like I'm relying on split losing the trailing null field there.
I guess I could write the regex as /(?=\n.)/s so split doesn't have to go
to extra work of splitting the last thing and then throwing it away.
Hm, maybe using \R might be better:

    my @lines = map { decode jis0212_raw => $_ } 
		split( /(?=\R.)/s, `str1 $var str2`);

Oh, never mind; the qx// implicit split doesn't use \n; it uses $/ 
(which is a bit of a bother to put in a m//).  So that's just: 

    my @lines = split( m[(?=\Q$/\E.)]s, 
		       decode(jis0212_raw => `str1 $var str2`) );

> I'm surprised that no one responded to this suggestion.

I'd noticed only Juerd's original, not the <<FOO:koi reply, because 
in my hastiness, I carelessly ran % scan `pick -subj Smack` and 
so missed the intriguing reply. 

Thanks, Johan!  Glad you flagged it.  Fun stuff, eh? :-)

--tom

PS: Now that I think of it, those pod markups would be better 
    written as C<<< <<"HIC" >>> instead of C< <<"HIC" >, because
    the space you get around the string varies.  This:

	% ( echo "=head1 WITNESS" ; echo preamble 'I<<< <<"HIC":Latin1 >>>' postamble ) | pod2text
	WITNESS
	preamble *<<"HIC":Latin1* postamble

    is probably better than this:

	% ( echo "=head1 WITNESS" ; echo preamble 'I< <<"HIC":Latin1 >' postamble ) | pod2text
	WITNESS
	preamble * <<"HIC":Latin1 * postamble





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About