develooper Front page | perl.perl5.porters | Postings from August 2008

Re: [PATCH] Add open "|-" and open "-|" to perlopentut

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
August 26, 2008 15:21
Subject:
Re: [PATCH] Add open "|-" and open "-|" to perlopentut
Message ID:
4650.1219789248@chthon
In-Reply-To: Message from Aristotle Pagaltzis <pagaltzis@gmx.de> 
   of "Tue, 26 Aug 2008 21:10:34 +0200." <20080826191034.GP32015@klangraum.plasmasturm.org> 

>* Tom Christiansen <tchrist@perl.com> [2008-08-26 20:30]:

>> I have myself no acquaintance with any culture where /^/ means
>> /^/m.

> Virtually every Unix utility processes input linewise, which
> makes `/^/` vs `/^/m` a distinction without a difference. 

A statement sounding more like some tautalogical exercise than 
a mystery wr

> It is debatable whether that makes `/^/` or `/^/m` the behaviour that
> a shell programmer might more likely expect when strings can contain
> embedded newlines; Damian argues that `/^/m` is more likely to reflect
> the extant expectation is correct. I think he is correct.

Two decades' fluency in Perl makes me a terribly poor test case to judge
extant expectations *myself*, but in the last few classes I've taught to
Perl beginners with existing Unix backgrounds, they've never seemed to
imagine that /^/ mataches interstitially.  To them, it's just the beginning
fo the string, and that's fine. That is, they are comfortable writing

    perl -ne 'print if /^vt/' /etc/termcap

But then using the rather different /\nvt/ on multiline strings.
Not that that's "right", but it shows their comfort zone.
That said, they're a long ways from *either*

    perl -le 'print for `cat /etc/termcap` =~ /\n(vt.*)/g'

or 

    perl -le 'print for `cat /etc/termcap` =~ /^(vt.*)/gm'

[AND NO, I don't use postfix-foreach almost ever.  This
 is just a demo for a line-liner.]

Now, what really is a bit of a surprise for them is They are much more
surprised by /$/'s flimsiness, how it matches the end of string *or* one
prior to that if the last character is a newline.  This leads to the
discussion of how it's a sop to those who forget to chomp (once chop) their
input, just as losing trailing null fields on split is.

If they knew look aheads, you could do something /(?=\n?\z)/, but it's too
early for that.  You do have to explain that changing $/ affects readline
and chomp, but not the $ fuzziness.  And duck.

But these aren't "hard".  Much harder by *VAR* is explaining that 
just what strings are matched by 

    /\b$VAR\b/
vs 
    /\B$VAR\B/

depends subtly on the class of the characters at those edges.  They 
get it right only when $VAR starts and ends in \w class characters. 
They get it wrong otherwise.  Thus, if the strings is 

    "a = b & c"

They figure that with $VAR as "b" the first matches, and pretty sure
it does also when it's "a" or "c", with the reverse holding for the
non-boundary case.  The problem is that they also think that with $VAR
as "=", all the same still applies, and little could be further from
the truth.  We work through this demo:

    for $s (split(" ", "a = b & c")) {
	printf qq("%s" =~ /\\b%s\\b/ == %d\n),   ($s)x 2, $s =~ /\b$s\b/;
	printf qq("%s" =~ /\\B%s\\B/ == %d\n\n), ($s)x 2, $s =~ /\B$s\B/;
    } 

And its output:

    "a" =~ /\ba\b/ == 1
    "a" =~ /\Ba\B/ == 0

    "=" =~ /\b=\b/ == 0
    "=" =~ /\B=\B/ == 1

    "b" =~ /\bb\b/ == 1
    "b" =~ /\Bb\B/ == 0

    "&" =~ /\b&\b/ == 0
    "&" =~ /\B&\B/ == 1

    "c" =~ /\bc\b/ == 1
    "c" =~ /\Bc\B/ == 0

And it may or may not finally sink in.  They think space should be a
boundary.  

Then, just when they think they have it, you add a separate ";" token at
the end, you'll get the same result as with "=", and this is also a bit
mystifying.

And even were that route available to me, correctly defining /\b/ and /\B/
using look-arounds is a just a tad harder than it may look to the less htan
intimately initiated.  

Try it, and see what I mean. :-)

> However, I do not follow his argument that this means a Perl programmer
> should use `/m` on every pattern. Neither do I follow your argument
> that the because C and shell programmers expect `\t` to mean a literal
> tab regardless of how the string (or character) literal is quoted, one
> should therefore always use double quotes to meet that expectation.
> Perl is not C, neither shell, nor sed, nor awk. Perl is Perl and should
> be treated as Perl.

First, remember that that was neither of my two principle arguments, 
but one of my pair of minor appendices.  My primary argument was and 
remains one of token-visibility, just as we prefer :: over ', while 
my secondary argument involved a more elaborate analysis of Perl's 
pervasively interpolative nature.

But these are arguments in the lofty academic sense only, as I do 
not believe them actually worth *arguing* about.  Little is.

Second, because I *am* very unlikely to confuse Perl for those other things
I'd make a poor personal test case.  I just have students who make for
decent ones.  And to them, I do hate (ok, enjoy not one blinkin' bit)
explaining that "\t" and '\t" differ, but that m/\t/ and m'\t' do not, or
that "\U$a" and '\U$a' differ, but unlike the \t case where m// doesn't
matter, m/\U$a/ and m'\U$a' indeed differ.   I know why.  You know why.
But I get queasy trying to cleanly explain it off the cuff.  Or cough.

I'll say this: Damian's PBP advice is probably 95% applicable to 95% of the
population, which narrowly puts it just a quarter-point above Sturgeon's
Law. :-)  Of course, if those WAGs are off and both should be 90%, then
we've fallen below it.  Still, I've no doubt his advice does more good than
bad.  My own occasional petty disagreements with them are few and minor,
and completely unworthy of being aired in public.  Most of what he says is
stuff I've always done anyway.  It's a far more elaborate treatment of the
perlstyle manpage than I would ever have done, but that's immaterial.  He
does say things that make me think, and that's a good thing.  Exception 
objects were a new take that seems sound in large projects, although
I've yet to find a programmer who pads with leading 0s.  Maybe Damian
hangs around with more COBOL programmers than I do, though.

For many years, though, I've found myself more and more using //x always,
and his notion of using [.] for \., or [*] for \*, for the sake of
legibility by avoiding the dratted sudilos [v.i.:-] is an interesting one,
and certainly not without merit.

    % perl -WE 'say $& if "*" =~ /[*]/'
    *

    % perl -WE 'say $& if "-" =~ /[-]/'
    -

    % perl -WE 'say $& if "." =~ /[.]/'
    .

    % perl -WE 'say $& if "{" =~ /[{]/'
    {

But it should still be looked at with a critical eye, for 
not only doesn't it always work:

    % perl -WE 'say $& if "^" =~ /[^]/'
    Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ^]/ at -e line 1.
    Exit 255

    % perl -WE 'say $& if "/" =~ /[/]/'
    Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at -e line 1.
    Exit 255

Worse still, it sometimes works for reasons far from what they appear,
as in this very non-parallel pair:

    % perl -WE 'say $& if "[" =~ /[[]/'
    [

    % perl -WE 'say $& if "]" =~ /[]]/'
    ]

I'm not saying not to use his advice here; Damian's written enough regex
code that he must have found it helpful for him.  I'm just saying to know
when the trick works and when (and how) it fails.  On the other hand,
backslashing instead always works.  But I wholly agree that it's terribly
ugly and risks confusion.  That's why we have all these pick-your-own-quote
constructs in q, qq, qx, qr, s, m, tr, and y, plus any others you've added
while I wasn't looking. :-)

Have fun passing only a single backslash there on the RHS of the =~ above,
for there the single quotes will not avail you.  Choosing different quotes
changes nothing; you *shall* use a bonus backslash, and please have a nice
day.  Alone amongst quoting mechanisms, only a \heredoc saves you (well, or
a 'heredoc') from falling into the double-slackbashing abyss:

     1	use 5.010_000;
     2	
     3	warn $& if  '\\'   =~ /[\\]/;
     4	warn $& if  "\\"   =~ /[\\]/;
     5	warn $& if q{\\}   =~ /[\\]/;
     6	
     7	warn $& if  '\\'   =~  /\\/;
     8	warn $& if  "\\"   =~ m{\\};
     9	warn $& if q{\\}   =~ m'\\';
    10	
    11	warn $& if <<\EOF  =~  /\\/;
    12	Drat this \backslash thing!
    13	EOF
    14	
    15	warn $& if <<"EOF" =~  /\\/;
    16	Drat this \\backslash thing!
    17	EOF
    18	
    19	warn $& if <<`EOF` =~  /\\/;
    20	perl -e 'printf "Drat this %cbackslash thing!", 2**2*23'
    21	EOF

Which executed says

    \ at /tmp/x line 3.
    \ at /tmp/x line 4.
    \ at /tmp/x line 5.
    \ at /tmp/x line 7.
    \ at /tmp/x line 8.
    \ at /tmp/x line 9.
    \ at /tmp/x line 11.
    \ at /tmp/x line 15.
    \ at /tmp/x line 19.

Despite using /x for visual chunking almost always, I do continue to
use the s&m flags not out of reflex, but out of careful deliberation,
varying on a case-by-case basis.  But this may be violating rule 
#1, which is "Never [try to] be [too] clever", otherwise you'll never 
understand it once reduced to 1/10th or 1/100th the memory, time, or
inherent capability.

--tom

PS: In the tradition of the Bourne shell :-), I still think \ should 
    be called a SUDILOS, as that's a much better name for it than a 
    HSALS or an ELUGRIV, now wouldn't you all agree? :-)  

    Plus, it always makes me sweat a bit to decode it, an act which 
    SUDILOS somewhat recalls if your bent lies in an Iberian direction, 
    making it ¡SÚDELOS! imperatively (mea culpa for the double-pun ;-)

	http://buscon.rae.es/draeI/SrvltConsulta?TIPO_BUS=3&LEMA=sudar

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About