Front page | perl.perl5.porters |
Postings from January 2004
Re: [perl #25263] sort -u [PATCH]
From:
John P. Linderman
Date:
January 27, 2004 19:27
Subject:
Re: [perl #25263] sort -u [PATCH]
Message ID:
200401271455.JAA50115@raptor.research.att.com
> # New Ticket Created by Dan Jacobson
> # Please include the string: [perl #25263]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=25263 >
>
>
> Add unix "sort -u" capability to sort,
> or add an example of the smart way to implement it to perldoc -f sort.
Here's a first cut at a perldoc patch to demonstrate a simple
approach for the common case, and some fancier alternatives
for the general case. Feel free to edit to taste. -- jpl
==============================
--- perlfunc.pod.orig 2004-01-26 12:25:35.000000000 -0500
+++ perlfunc.pod 2004-01-27 09:40:43.000000000 -0500
@@ -4819,6 +4819,43 @@
@result = sort { $a <=> $b } grep { $_ == $_ } @input;
+There is no built in equivalent of the UNIX C<sort -u> command,
+which, given a B<run> of records that compare equal,
+outputs only one representative record. For simple lexical
+sorting, a hash can be used to eliminate duplicated keys.
+
+ %hash = map { $_ => 1 } @input;
+ @unique = sort keys %hash;
+
+However, this may not work for more complicated comparison functions,
+since equal keys may be produced from distinct records.
+
+ if (@all = sort { comparator($a,$b) } @input) {
+ $representative = shift(@all);
+ $occurrences = 1;
+ while (@all) {
+ $item = shift(@all);
+ if (comparator($representative, $item)) {
+ push(@unique, $representative);
+ $representative = $item;
+ $occurrences = 1;
+ } else {
+ ++$occurrences;
+ $representative = $item
+ if (rand() * $occurrences >= 1);
+ }
+ }
+ push(@unique, $representative);
+ }
+
+The code above will choose a representative at random from
+each run of duplicated elements.
+If you are happy to take the first or last duplicated element
+as the representative, you need not keep track of occurrences.
+Just eliminate the C<else> entirely to select the first,
+or make the assignment in the C<else> unconditional to select
+the last.
+
=item splice ARRAY,OFFSET,LENGTH,LIST
=item splice ARRAY,OFFSET,LENGTH