develooper Front page | perl.perl5.porters | Postings from June 2004

Re: sort in scalar context, but broken due to inexperience.

From:
david nicol
Date:
June 9, 2004 18:33
Subject:
Re: sort in scalar context, but broken due to inexperience.
Message ID:
1086831210.1502.141.camel@plaza.davidnicol.com
On Tue, 2004-06-08 at 09:59, Nick Ing-Simmons wrote:

> But note that 'sorting_av' thing - what sets that?

There are several situations that are all rolled into the
pp_sort function, and it checks flags to determine what it
is doing at various times. The questions are,

What exactly are we sorting? to which the answer is, either the stack
or some designated av in the case of inplace. (if I understood that
right.)

and

How do we call the comparison function?

it seems that pp_sort could be made tighter by figuring out
these things first and then having longer redundant code
rather than shorter code with conditionals.

(We're trading some pipeline stalls for a smaller executable?)

but pp_sort _is_ merely a dispatch function, so it might
matter only in terms of readability. Things like

 sortsv(ORIGMARK+1, max,
       (PL_op->op_private & OPpSORT_NUMERIC)
                 ? ( (PL_op->op_private & OPpSORT_INTEGER)
                     ? ( overloading ? amagic_i_ncmp : sv_i_ncmp)
                     : ( overloading ? amagic_ncmp : sv_ncmp))
                 : ( IN_LOCALE_RUNTIME
                     ? ( overloading
                         ? amagic_cmp_locale
                         : sv_cmp_locale_static)
                     : ( overloading ? amagic_cmp : sv_cmp_static)))


scream to me to be rearranged with all the conditions at the outside
and a series of simpler sortsv calls on the inside, if only to see how 
that looks:


 if (overloading){
  if (PL_op->op_private & OPpSORT_NUMERIC){
   if (PL_op->op_private & OPpSORT_INTEGER){
      sortsv(ORIGMARK+1, max, amagic_i_ncmp)
   }else{ /* not integer */
      sortsv(ORIGMARK+1, max, amagic_ncmp )
   }
  }else{ /* not PL_op->op_private & OPpSORT_NUMERIC */
   if (IN_LOCALE_RUNTIME){
      sortsv(ORIGMARK+1, max, amagic_cmp_locale )
   }else{ /* no LOCALE */
      sortsv(ORIGMARK+1, max, amagic_cmp )
   }
  }
 }else{ /* not overloading */
  if (PL_op->op_private & OPpSORT_NUMERIC){
   if (PL_op->op_private & OPpSORT_INTEGER){
      sortsv(ORIGMARK+1, max, sv_i_ncmp)
   }else{ /* not integer */
      sortsv(ORIGMARK+1, max, sv_ncmp )
   }
  }else{ /* not PL_op->op_private & OPpSORT_NUMERIC */
   if (IN_LOCALE_RUNTIME){
      sortsv(ORIGMARK+1, max, sv_cmp_locale )
   }else{ /* no LOCALE */
      sortsv(ORIGMARK+1, max, sv_cmp )
   }
  }
 }



But even easier to work with would be if we could identify the
comparison function required early on, then use it in both sortsv calls
and also in the sortedness-determinator.  The sortedness-determinator
is really trivial -- call the compare function N-1 times
and tally the nonpositives.  The problem is that pp_sort is a fuzzy
mess of conditionals.


something like

	sortbase = sorting_av? that_av : (ORIGMARK+1);
	comparison_function = 
	    user_provded_comparison_function
              ? whatever_it_is 
              : (PL_op->op_private & OPpSORT_NUMERIC)
                        ? ( (PL_op->op_private & OPpSORT_INTEGER)
                            ? ( overloading ? amagic_i_ncmp : sv_i_ncmp)
                            : ( overloading ? amagic_ncmp : sv_ncmp))
                        : ( IN_LOCALE_RUNTIME
                            ? ( overloading
                                ? amagic_cmp_locale
                                : sv_cmp_locale_static)
                            : ( overloading ? amagic_cmp : sv_cmp_static)));

        sortsv(sortbase, max, comparison_function);

would in my opinion be the easiest to make sense of quickly.


I'm a little bit mystified about how pp_sort manages to be thread-safe, and
curious as to how these optimization flags get set -- is there a pre-pass that
sets all the NVs, or do you have to tell the sort module that you've just
got integers?  I guess it is recursion safe by setting the stack pointer
to the end of the array before calling the sort routine, so sorts in comparison
functions ("Sort these arrays by how sorted they are") work.


To run with the hubris, I'd rewrite pp_sort as follows:

	first, determine what we are going to sort, and how big it is,
	and stop now (returning 1 in scalar context) on singletons and empties.

	then, determine how we are going to sort it

	then, one call to sv_sort (or determine sortedness in scalar context)

	
"Determine how we are going to sort it" is currently in two stages, we
remember if we didn't find a user-provided sort function and then later
do a second conditional based on that result, and all the hints.

But maybe it can't be cleaned any cleaner than it is now because when
there is a user-provded function there are wrappers to choose from and there
are no wrappers with the default functions.  But why couldn't the
sort-how selection code choose the correct wrapper?


Those are my thoughts, continuing thanks for reading this.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About