develooper Front page | | Postings from September 2002

Explanation of solution

Michael W Thelen
September 9, 2002 12:12
Explanation of solution
Message ID:
Hi all,

Here's an explanation of my solution.  Sorry this is so long, but hopefully
it's clear.  It might not be, because in the course of analyzing it, I
discovered that I didn't understand every detail of my solution as well as I
thought I did.  Corrections are welcome!

I went with the first algorithm I found in a Google search, which uses an stack
of operators and operands.  Like pom, at some point I realized that the desired
output could be achieved just by moving the operators around, so my stack deals
only with operators and parentheses (no operands).

Here's the full solution, which I will break down into parts.  I've replaced
literal tab characters with \t for readability.

#!perl -lp
y/()\t /\t_/d+s@\t|(?<=\w)\D@($"=" $&".$")=~s#_(.*?)\t|(\S)([ */${":"x/\G[+-]/}]*)#$2 #;$+@eg;$_.=$";y/ //s+chop

#perl -lp

-l Use line processing (strips newline from input, adds newline to output).  
-p Automatically loop over <>, and print $_ after each iteration (i.e., once).

y/()\t /\t_/d

Delete all whitespace, and convert "(" to "\t", ")" to "_".  The reason for
converting parentheses will become clear later.

s@\t|(?<=\w)\D@ ... @eg;

The main processing happens inside this substitution.  The left-hand side of
the substitution looks for operators and parens.  These will be replaced with
whitespace and operators off the stack.  Remember that \t represents opening
parens.  All operators and closing parens must be non-digits that follow a
digit or a closing paren.  This avoids matching unary -, which must follow an
operator or an opening paren.  A single-char lookbehind does the trick.  Here
is the reason why ")" was converted to "_"... so this lookbehind could simply
match \w to get digits and closing parens.

($"=" $&$\"")=~s#_(.*?)\t|(\S)([ */${":"x/\G[+-]/}]*)#$2 #;$+

$" is used as a string representation of the stack.  The stack grows in reverse
order (to the left) so that operators can easily be "popped" from the beginning
of the string using a simple substitution.  The current operator is in $&, and
it's pushed on the stack with ($"=" $&$\""). ($"=" $&".$") does the same thing,
but I like the weird interpolation of $".  Then the stack is popped in one of
two ways:

- If $" contains a closing paren ("_"), then $& must have contained a closing
  paren, so pop everything up to the most recent opening paren.  Place 
  everything but the parens back in the original string by matching them and
  returning $+.

- Otherwise, $& contains either an operator or an opening paren ("\t").

  + If $& is an operator, then pop all operators with equal or higher
    precedence.  Do not pop opening parens.  (\S) matches the operator that was
    just pushed (we don't want to pop -that- one yet).  Always pop "*" and "/".
    Only pop "-" if $& is "+" or "-".  "+" does not need to be popped from the
    stack (discovered by accident) because addition is commutative.  Since $& is
    overwritten at this point, check for "+" or "-" using /\G[+-]/.  The ${...}
    construction will interpolate the contents of either $: or ${''} (" \n-" or
    undef, respectively), depending on whether the match is true.  $: is used
    in order to get the character class to match "-" (there are no newlines on
    the stack, and matching whitespace is fine).  $+ will contain all the
    matched operators and whitespace.

  + If $& is an opening paren, then leave it on the stack and return only
    whitespace.  (\S) will match the first operator if there are any on the
    stack (this is why "(" was changed to "\t", because it must not be matched
    here, and \S is very short).  The character class will only match [ */].
    Since two consecutive * / ops cannot be on the stack, and the whitespace
    will be replaced on the right-hand side, this is okay. $+ will only contain
    whitespace, or undef if there were no operators on the stack.

$_.=$";y/ //s+chop

The final cleanup stage.  If there is anything left on the stack at the end,
append it to $_.  Since whitespace was handled rather sloppily, we must also
squash consecutive space characters into single spaces.  Finally, chop off the
final character, which is always a space.  This is why $" was chosen to
represent the stack... it will contain whitespace even if the regex was never
evaluated (i.e., the input contained no operators).  And the right-hand side of
the main regex ensures that the stack will always have whitespace at the end.

-- Mike

Michael W. Thelen
eval unpack u,'M*"1C/2<P,BDN-"`U+C`A(RLG*3U^>2\A+48O82UZ+SME=F%L*"1C+B(G=2HGM+"
<Z,D<U4SU<(B%!.T9=5#HF-5(H)2%%/$907$`Z)B5#.E8U4BM@2&`G+EPD$+R(I.P``' Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About