develooper Front page | perl.perl5.porters | Postings from July 2021

Re: Elevator pitch, deprecating $a/$b sort globals by using sub{}

Thread Previous | Thread Next
From:
=?UTF-8?Q?Salvador_Fandi=c3=b1o?=
Date:
July 7, 2021 08:38
Subject:
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{}
Message ID:
069d6385-4699-a138-0028-728bbc63a73f@gmail.com
On 6/7/21 22:32, Smylers wrote:
> Salvador Fandiño writes:
> 
>> Reducing key comparison to just numeric or string comparison is not
>> flexible enough (or at least not user-friendly enough).
> 
> True, though other objects can override <=> or cmp, which provides the
> opportunity for any custom sorting you want.
> 
>> For a start you want to support having multiple keys, every one with
>> its own comparison operation and direction.
> 
> Good point. But it still seems a shame to have to specify each key
> twice, once for $a and once for $b.
> 
>> BTW, regarding direction, sorting and reversing is not equivalent to
>> sorting in the opposite direction.
> 
> Oh, I thought that Perl special-cased C<reverse sort> so that it *is*
> equivalent to sorting in the opposite direction. Have you an example
> where it isn't?

Yes, for instance:

   use Data::Dumper qw(Dumper);
   @a=(1.0, 1.1, 2, 1.3, 2.4, 2.1);

   print(Dumper [sort { int($b) <=> int($a) } @a]);
   print(Dumper [reverse sort { int($a) <=> int($b) } @a]);


Which outputs:

   $VAR1 = [
             2,
             '2.4',
             '2.1',
             1,
             '1.1',
             '1.3'
           ];
   $VAR1 = [
             '2.1',
             '2.4',
             2,
             '1.3',
             '1.1',
             1
           ];


The difference is that elements with the same key are reversed. Or in 
other words, "reverse sort {...}" is not an stable sorting algorithm.


>> So if you provide "numsort" you also need "descnumsort".
> 
> I explicitly suggesting using reverse to avoid a proliferation of
> functions.
> 
>> Finally, check Sort::Key family of modules. They support almost
>> everything related to key based sorting but at the expense of
>> providing a myriad of functions in order to cover all the different
>> key type and direction combinations.
> 
> Yeah, that's too many functions. But, looking specifically at the main
> Sort::Key module: https://metacpan.org/pod/Sort::Key
> 
> • The various r*sort functions should be handle-able by reverse.
> • In-place sorting is a different matter to that being discussed in this
>    thread, so the *sort_inplace functions aren't relevant here.
> • Numerically sorting specifically as signed or unsigned integers is a
>    niche requirement that doesn't fit well with core Perl, which mostly
>    just deals with numbers but nothing more specific, so the *i*sort and
>    *u*sort functions are probably not relevant to this thread either.

Yes, the main reason Sort::key supports different numeric types is to be 
as fast as possible.


> That leaves:
> 
> • keysort, which is equivalent to what I called asort above
> • nkeysort, equivalent to what I called nsort above
> • nsort, which is just nkeysort without a block; given that Perl's
>    existing sort function manages to have an optional block, a core
>    nsort presumably could as well, meaning that nkeysort and nsort could
>    be just one function
> 
> So that's two functions. Which isn't one, but nor is it the myriad that
> Sort::Key provides, while still covering many of its scenarios. It may
> be that that's *enough* common cases that it's still worth doing.
> 
> As you point out above, it doesn't cover the Sort::Key::Maker
> functionality of sorting by, say, descending age and then alphabetical
> names where age is equal. With core sort that currently involves
> something like:
> 
>      sort { $b->age <=> $a->age || $a->name cmp $b->name } @list
> 
> Or with that module:
> 
>      use Sort::Key::Maker
>          sort_year_groups => sub { $_->age, $_->name }, qw<int string>;
>      sort_year_groups @list;
> 
> Providing that with a built-in sort function and not repeating $a and $b
> would require something *equivalent to*:
> 
>      ksort [-n => sub { $_->age }, a => sub { $_->name }], @list;
> 
> or:
> 
>      ksort sub :numeric:desc { $_->age }, sub { $_->name }, \@list;
> 

> That is, a list of pairs of type/direction indicators and
> keys, followed by the list of things to sort.
> 
> But there isn't really a syntax for that which fits in with any other
> core function:
> 
> • It needs two separate lists: the spec list, then the list of items. I
>    don't think there's any other core function that handles multiple
>    lists.
> 
> • It needs (potentially) multiple blocks. I've put explicit sub-s in the
>    example above. Core functions taking a block don't require the sub
>    prefix, but only take one block, in a known place (and directly as an
>    arg, not nested inside an array).
> 
> • It needs a mini-language or options syntax for specifying the
>    comparison type and direction. There are core functions with something
>    similar, but none that directly seem to apply here:
> 
>    » pack/unpack and (s)printf take a (non-optional) initial spec sting
>      that's specific to the function; but a single string does it,
>      without need for blocks.
> 
>    » open takes an optional mode string. That can itself have optional
>      layers applied to it. open's API should probably not be copied by
>      anything, ever.
> 
>    » seek's final argument is the WHENCE indicator, which to be specified
>      symbolically needs importing from a separate module. Having to
>      import constants from a module isn't a big advantage over using a
>      function or function factory from a module.
> 
>    » //, s///, and tr/// have suffix option letters, which only work
>      because of the specific quote-like syntax of those operators. split
>      also uses some of these, but again only because the // syntax
>      enables it. There isn't a place to put suffix options on most
>      functions.
> 
>    Perl doesn't have a general ‘here are some options for this function’
>    syntax which can be applied here.
> 
> So any new sort routine which avoids the need for $a and $b by only
> specifying one of them (with $_) and still has full flexibility for
> multiple keys would still need some new syntax which isn't (currently)
> used by any other function.

Well, actually it doesn't!

You can just chain key sorts. For instance:

   @sorted = numsort { $_->{foo} }
             descnumsort { $_->{bar} }
             sort { $_->{doz} }
             @data;

Now it is just(!) a matter of having the interpreter recognize such 
construction and optimize it into just one efficient sorting operation.

Note that I am not saying that this is the best solution, just that it 
is possible. Well, actually I find the code above quite ugly and would 
prefer something like this:

   @sorted = sort :numeric { $_->{foo} }
             sort :numeric :desc { $_->{bar} }
             sort { $_->{doz} }
             @data;

A drawback of chaining several sort calls is that you can not reuse code 
from one key extraction block to the next. For instance, something 
pretty common is using a regular expression to extract several keys as in:

   multikeysort { split(/,/, $_)[3,4] } @data;

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About