Re: Raku version of "The top 10 tricks of Perl one-liners" ?!?

Larry Wall
July 22, 2020 19:21
Re: Raku version of "The top 10 tricks of Perl one-liners" ?!?
On Sun, Jul 19, 2020 at 09:38:31PM -0700, William Michels via perl6-users wrote:
: Hello,
: I ran across this 2010 Perl(5) article on the Oracle Linux Blog:
: "The top 10 tricks of Perl one-liners"
: Q1. Now that it's a decade later--and Raku (née Perl6) has hit the
: scene--can someone translate the 'top ten tricks' in the blog article
: above into Raku?
: Q2. Are many of the ten Perl(5) one-liner 'tricks' unnecessary in Raku
: (better defaults, more regularized regexes, etc.)?
: Best, Bill.

Yes, and yes.  :-)

More specificially, here's my take.

>   Trick #1: -l
>        Smart newline processing. Normally, perl hands you entire lines,
>        including a trailing newline. With -l, it will strip the trailing
>        newline off of any lines read, and automatically add a newline to
>        anything you print (including via -p).
>        Suppose I wanted to strip trailing whitespace from a file. I might
>        naïvely try something like
>        perl -pe 's/\s*$//'
>        The problem, however, is that the line ends with "\n", which is
>        whitespace, and so that snippet will also remove all newlines from
>        my file! -l solves the problem, by pulling off the newline before
>        handing my script the line, and then tacking a new one on afterwards:
>        perl -lpe 's/\s*$//'

This trick is not needed in Raku, since newlines are stripped by default.  Also,
there are .trim methods that you can use instead of regex.

>    Trick #2: -0
>        Occasionally, it's useful to run a script over an entire file,
>        or over larger chunks at once. -0 makes -n and -p feed you chunks
>        split on NULL bytes instead of newlines. This is often useful for,
>        e.g. processing the output of find -print0. Furthermore, perl -0777
>        makes perl not do any splitting, and pass entire files to your script
>        in $_.
>        find . -name '*~' -print0 | perl -0ne unlink
>        Could be used to delete all ~-files in a directory tree, without
>        having to remember how xargs works.

The key word above is "occasionally", so most of these seldom-used switches are gone.
Also, most of their functions are really easy to do from inside the language.
So these days dividing a file by null chars would typicaly be handled with:

    for slurp.split("\0") { ... }

>    Trick #3: -i
>        -i tells perl to operate on files in-place. If you use -n or -p with
>        -i, and you pass perl filenames on the command-line, perl will run
>        your script on those files, and then replace their contents with the
>        output. -i optionally accepts an backup suffix as argument; Perl will
>        write backup copies of edited files to names with that suffix added.
>        perl -i.bak -ne 'print unless /^#/'
>        Would strip all whole-line commands from, but leave a copy
>        of the original in

I'm not aware of a direct replacement for this in Raku.  Perl has to be
better at something...

>    Trick #4: The .. operator
>        Perl's .. operator is a stateful operator -- it remembers state
>        between evaluations. As long as its left operand is false, it returns
>        false; Once the left hand returns true, it starts evaluating the
>        right-hand operand until that becomes true, at which point, on
>        the next iteration it resets to false and starts testing the other
>        operand again.
>        What does that mean in practice? It's a range operator: It can be
>        easily used to act on a range of lines in a file. For instance,
>        I can extract all GPG public keys from a file using:
>        perl -ne 'print if /-----BEGIN PGP PUBLIC KEY BLOCK-----/../-----END PGP PUBLIC KEY BLOCK-----/' FILE

The scalar .. operator in Perl translates to the ff operator in Raku.
It's slightly less magical, however, insofar as it won't treat bare
numbers as line numbers in the input.

>    Trick #5: -a
>        -a turns on autosplit mode – perl will automatically split input
>        lines on whitespace into the @F array. If you ever run into any advice
>        that accidentally escaped from 1980 telling you to use awk because
>        it automatically splits lines into fields, this is how you use perl
>        to do the same thing without learning another, even worse, language.
>        As an example, you could print a list of files along with their link
>        counts using
>        ls -l | perl -lane 'print "$F[7] $F[1]"'

This feature was always a bit suspect because it hard-wired a particular
name.  You don't even need a weird name in Raku:

     ls -l | raku -ne 'say "$_[7] $_[1]" given .words'

>    Trick #6: -F
>        -F is used in conjunction with -a, to choose the delimiter on
>        which to split lines. To print every user in /etc/passwd (which is
>        colon-separated with the user in the first column), we could do:
>        perl -F: -lane 'print $F[0]' /etc/passwd

Again, we felt this switch wasn't really pulling it's weight, so we pulled it
in favor of explicit split or comb:

     raku -ne 'say $_[0] given .split(":")' /etc/passwd

>    Trick #7: \K
>        \K is undoubtedly my favorite little-known-feature of Perl regular
>        expressions. If \K appears in a regex, it causes the regex matcher to
>        drop everything before that point from the internal record of "Which
>        string did this regex match?". This is most useful in conjunction
>        with s///, where it gives you a simple way to match a long expression,
>        but only replace a suffix of it.
>        Suppose I want to replace the From: field in an email. We could
>        write something like
>        perl -lape 's/(^From:).*/$1 Nelson Elhage <nelhage\>/'
>        But having to parenthesize the right bit and include the $1 is
>        annoying and error-prone. We can simplify the regex by using \K to
>        tell perl we won't want to replace the start of the match:
>        perl -lape 's/^From:\K.*/ Nelson Elhage <nelhage\>/'

Perl's \K \k becomes <( )> in Raku.  Note that there are other regex changes as well,
and that in the replacement it's not necessary to escape the @ in the absence of brackets:

     raku -pe 's/ ^ "From:" <(.*)> / Nelson Elhage <>/'

The )> is not required to balance there, but helps clarify the intention.  If you do
have a quoting problem in the replacement, you can use the assignment form with
any other form of quoting instead:

     raku -pe 's[ ^ "From:" <(.*)> ] = Q[Nelson Elhage <>]'

>    Trick #8: $ENV{}
>        When you're writing a one-liner using -e in the shell, you generally
>        want to quote it with ', so that dollar signs inside the one-liner
>        aren't expanded by the shell. But that makes it annoying to use a '
>        inside your one-liner, since you can't escape a single quote inside
>        of single quotes, in the shell.
>        Let's suppose we wanted to print the username of anyone in /etc/passwd
>        whose name included an apostrophe. One option would be to use a
>        standard shell-quoting trick to include the ':

>        perl -F: -lane 'print $F[0] if $F[4] =~ /'"'"'/' /etc/passwd
>        But counting apostrophes and backslashes gets old fast. A better
>        option, in my opinion, is to use the environment to pass the regex
>        into perl, which lets you dodge a layer of parsing entirely:
>        env re="'" perl -F: -lane 'print $F[0] if $F[4] =~ /$ENV{re}/' /etc/passwd
>        We use the env command to place the regex in a variable called re,
>        which we can then refer to from the perl script through the %ENV
>        hash. This way is slightly longer, but I find the savings in counting
>        backslashes or quotes to be worth it, especially if you need to end
>        up embedding strings with more than a single metacharacter.

This is rather Unix-centric on the face of it, since on Windows you'd
have to use outer "" quoting instead.  But you can certainly use the
same trick with Raku, provided you spell ENV right:

     env re="'" raku -ne '(say .[0] if .[4] ~~ /<{ %*ENV<re> }>/) given .split(":")' /etc/passwd

It probably won't be very efficient though, and doesn't do a thing for readability.
Much easier to use a character name:

     raku -ne '(say .[0] if .[4] ~~ /\c[APOSTROPHE]/) given .split(":")' /etc/passwd

You could backport that trick to Perl using \N{} too, I guess.

>    Trick #9: BEGIN and END
>        BEGIN { ... } and END { ... } let you put code that gets run entirely
>        before or after the loop over the lines.
>        For example, I could sum the values in the second column of a CSV
>        file using:
>        perl -F, -lane '$t += $F[1]; END { print $t }'

Same trick, except you can omit the brackets:

     raku -ne 'my $t += [1] given .split(","); END say $t'

Note the 'my' is required because strict is the default.

>    Trick #10: -MRegexp::Common
>        Using -M on the command line tells perl to load the given module
>        before running your code. There are thousands of modules available
>        on CPAN, numerous of them potentially useful in one-liners, but
>        one of my favorite for one-liner use is Regexp::Common, which, as
>        its name suggests, contains regular expressions to match numerous
>        commonly-used pieces of data.
>        The full set of regexes available in Regexp::Common is available in
>        its documentation, but here's an example of where I might use it:
>        Neither the ifconfig nor the ip tool that is supposed to replace it
>        provide, as far as I know, an easy way of extracting information for
>        use by scripts. The ifdata program provides such an interface, but
>        isn't installed everywhere. Using perl and Regexp::Common, however,
>        we can do a pretty decent job of extracing an IP from ips output:
>        ip address list eth0 | \
>          perl -MRegexp::Common -lne 'print $1 if /($RE{net}{IPv4})/'

I don't know if there's anything quite comparable.  And who's to say
what's "common" anymore...   Certainly we have -M.  But Raku's regex
and grammars are so much more powerful that these things are likely to
kept in more specific Grammar modules anyway, or just hand-rolled for
the purpose on the spot.

