Front page | perl.golf |
Postings from September 2002
Explanation of solution
From:
Michael W Thelen
Date:
September 9, 2002 12:12
Subject:
Explanation of solution
Message ID:
20020909131151.D2564@attensity.com
Hi all,
Here's an explanation of my solution. Sorry this is so long, but hopefully
it's clear. It might not be, because in the course of analyzing it, I
discovered that I didn't understand every detail of my solution as well as I
thought I did. Corrections are welcome!
I went with the first algorithm I found in a Google search, which uses an stack
of operators and operands. Like pom, at some point I realized that the desired
output could be achieved just by moving the operators around, so my stack deals
only with operators and parentheses (no operands).
Here's the full solution, which I will break down into parts. I've replaced
literal tab characters with \t for readability.
#!perl -lp
y/()\t /\t_/d+s@\t|(?<=\w)\D@($"=" $&".$")=~s#_(.*?)\t|(\S)([ */${":"x/\G[+-]/}]*)#$2 #;$+@eg;$_.=$";y/ //s+chop
--------------------------------
#perl -lp
-l Use line processing (strips newline from input, adds newline to output).
-p Automatically loop over <>, and print $_ after each iteration (i.e., once).
--------------------------------
y/()\t /\t_/d
Delete all whitespace, and convert "(" to "\t", ")" to "_". The reason for
converting parentheses will become clear later.
--------------------------------
s@\t|(?<=\w)\D@ ... @eg;
The main processing happens inside this substitution. The left-hand side of
the substitution looks for operators and parens. These will be replaced with
whitespace and operators off the stack. Remember that \t represents opening
parens. All operators and closing parens must be non-digits that follow a
digit or a closing paren. This avoids matching unary -, which must follow an
operator or an opening paren. A single-char lookbehind does the trick. Here
is the reason why ")" was converted to "_"... so this lookbehind could simply
match \w to get digits and closing parens.
--------------------------------
($"=" $&$\"")=~s#_(.*?)\t|(\S)([ */${":"x/\G[+-]/}]*)#$2 #;$+
$" is used as a string representation of the stack. The stack grows in reverse
order (to the left) so that operators can easily be "popped" from the beginning
of the string using a simple substitution. The current operator is in $&, and
it's pushed on the stack with ($"=" $&$\""). ($"=" $&".$") does the same thing,
but I like the weird interpolation of $". Then the stack is popped in one of
two ways:
- If $" contains a closing paren ("_"), then $& must have contained a closing
paren, so pop everything up to the most recent opening paren. Place
everything but the parens back in the original string by matching them and
returning $+.
- Otherwise, $& contains either an operator or an opening paren ("\t").
+ If $& is an operator, then pop all operators with equal or higher
precedence. Do not pop opening parens. (\S) matches the operator that was
just pushed (we don't want to pop -that- one yet). Always pop "*" and "/".
Only pop "-" if $& is "+" or "-". "+" does not need to be popped from the
stack (discovered by accident) because addition is commutative. Since $& is
overwritten at this point, check for "+" or "-" using /\G[+-]/. The ${...}
construction will interpolate the contents of either $: or ${''} (" \n-" or
undef, respectively), depending on whether the match is true. $: is used
in order to get the character class to match "-" (there are no newlines on
the stack, and matching whitespace is fine). $+ will contain all the
matched operators and whitespace.
+ If $& is an opening paren, then leave it on the stack and return only
whitespace. (\S) will match the first operator if there are any on the
stack (this is why "(" was changed to "\t", because it must not be matched
here, and \S is very short). The character class will only match [ */].
Since two consecutive * / ops cannot be on the stack, and the whitespace
will be replaced on the right-hand side, this is okay. $+ will only contain
whitespace, or undef if there were no operators on the stack.
--------------------------------
$_.=$";y/ //s+chop
The final cleanup stage. If there is anything left on the stack at the end,
append it to $_. Since whitespace was handled rather sloppily, we must also
squash consecutive space characters into single spaces. Finally, chop off the
final character, which is always a space. This is why $" was chosen to
represent the stack... it will contain whitespace even if the regex was never
evaluated (i.e., the input contained no operators). And the right-hand side of
the main regex ensures that the stack will always have whitespace at the end.
-- Mike
--
Michael W. Thelen
eval unpack u,'M*"1C/2<P,BDN-"`U+C`A(RLG*3U^>2\A+48O82UZ+SME=F%L*"1C+B(G=2HGM+"
<Z,D<U4SU<(B%!.T9=5#HF-5(H)2%%/$907$`Z)B5#.E8U4BM@2&`G+EPD$+R(I.P``'
-
Explanation of solution
by Michael W Thelen