develooper Front page | perl.perl5.porters | Postings from August 2016

Re: [perl #39358] sort with custom subname and prototype ($$)segfaults intermittently

From:
John P. Linderman
Date:
August 11, 2016 16:08
Subject:
Re: [perl #39358] sort with custom subname and prototype ($$)segfaults intermittently
Message ID:
CAC0cEp_giKmwnavBP4RbgMMAH+4dXyoOnERX0UTDCKgjCtjqFQ@mail.gmail.com
I think we are in violent agreement. As you noted, long ago the internal
sort worked directly on an AvARRAY, and, when the sort algorithm was
quicksort, that was not a bad idea. The contents of that array could be
rearranged, but they always contained SV pointers. When I ported Peter
McIlroy's mergesort to Perl, I allocated another array of the same size,
and, during the initial pass, it held pointers into the original array to
delimit runs. On subsequent passes, however, the roles of the two arrays
were reversed, one holding the original SV pointers, the other pointers
into that array. So in every other pass, the original array held pointers
to things that were most assuredly not SV's. Accessing them as though they
were SV pointers generally lead to segfaults.

What I meant to say is that changes could have been made to the sort
routine so that an AvARRAY could continue to be the argument. The sort
could have allocated two extra arrays, and copied the original into one of
the arrays, so the contents of the original would never be anything other
than SV pointers. This would have cost extra copying and extra space, but
this is arguably better than risking segfaults. It appears you have paid
the copying and allocating expenses at a higher level, and I certainly have
no problem with that. It has the effect of making the copy of the array
that is participating in the sort "off limits".

The other point I was trying to make is that access to the SV pointers that
are in the process of being rearranged (now impossible) was always asking
for trouble. In the code in question

$match_indices[$A]

is simply $A, and would have been better written as such. If that
expression referenced a partially rearranged version of the original array,
then the comparison routine would probably "violate the contract". For
example, if the first two elements of @match_indices were reversed, then
the sense of the comparison for elements 0 and 1 would also be reversed,
and the contract demands that the comparison routine be consistent when the
same two elements are compared.

In (not so very) short, comparison routines that reference the array that
is being sorted are probably not doing what the user expects.

On Thu, Aug 11, 2016 at 4:11 AM, Dave Mitchell <davem@iabyn.com> wrote:

> On Wed, Aug 10, 2016 at 05:03:28PM -0400, John P. Linderman wrote:
> > #!/usr/bin/perl -w
> > @scores = (
> >   '0000.0000.00039:0000.0008.0031.',
> >   '0000.0000.00032:0000.0008.0024.',
> >   '0002.0002.00033:0002.0011.0020.',
> >   '0028.0028.00190:0077.0085.0028.');
> > @match_indices = (0,1,2,3);
> > sub sort_by_index($$) {
> >     my($A,$B) = @_;
> >     return $scores[$match_indices[$A]] cmp
> >    $scores[$match_indices[$B]];
> > }
> > @match_indices = sort sort_by_index @match_indices;
> >
> >
> > This could be "fixed" at the expense of allocating *two* extra arrays
> > instead of just one (which is already more than quicksort, which sorts
> "in
> > place").
>
> I don't understand any of the above sentence. Its just been fixed, so why
> is it still "could be"?  No extra arrays are created, and I don't
> understand why creating an extra array would be of benefit.
>
> > But I wonder why people think that the contents of any array in
> > the process of being sorted can participate in any meaningful way in a
> > comparison. Must all sort implementations guarantee that the intermediate
> > contents of the array to be sorted are some permutation of the original
> > array?
>
> In this code:
>
>     @a = sort {...} @a
>
> at the logical perl level, a list is created from the contents of @a,
> sort returns a sorted version of that list, then that list is assigned
> to @a. If @a gets modified during the sort itself (and before the
> assignment), then that's internal optimisation details leaking out.
>
> The internal quicksort and mergesort implementations sort a (C) array
> of SV pointers - usually that array is just a chunk of perl's argument
> stack. Before my commit, they someones also worked directly on an AvARRAY
> array, which turned out to be a bad idea.
>
> > I don't like the idea of penalizing *all* users for the dubious
> > intent of *some* users. Segfaults are evil, of course. A better solution
> > would be to somehow mark an array as "off limits" while a sort is in
> > progress, but that's way beyond my pay grade.
>
> The previous code temporarily marked the array as readonly, which stopped
> modification, but didn't stop access. The only way to stop reading would
> be to temporarily tie the array. But this wuld be semantically wrong:
> there's nothing to imply in the expression
>
>     something = sort { ... } @a
>
> that accessing @a during the sort would be an error.
>
> --
> "But Sidley Park is already a picture, and a most amiable picture too.
> The slopes are green and gentle. The trees are companionably grouped at
> intervals that show them to advantage. The rill is a serpentine ribbon
> unwound from the lake peaceably contained by meadows on which the right
> amount of sheep are tastefully arranged." -- Lady Croom, "Arcadia"
>



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About