develooper Front page | perl.perl5.porters | Postings from January 2004

Re: [perl #25263] sort -u [PATCH]

From:
John P. Linderman
Date:
January 27, 2004 19:27
Subject:
Re: [perl #25263] sort -u [PATCH]
Message ID:
200401271455.JAA50115@raptor.research.att.com
> # New Ticket Created by  Dan Jacobson 
> # Please include the string:  [perl #25263]
> # in the subject line of all future correspondence about this issue. 
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=25263 >
> 
> 
> Add unix "sort -u" capability to sort,
> or add an example of the smart way to implement it to perldoc -f sort.

Here's a first cut at a perldoc patch to demonstrate a simple
approach for the common case, and some fancier alternatives
for the general case.  Feel free to edit to taste.  -- jpl

==============================


--- perlfunc.pod.orig	2004-01-26 12:25:35.000000000 -0500
+++ perlfunc.pod	2004-01-27 09:40:43.000000000 -0500
@@ -4819,6 +4819,43 @@
 
     @result = sort { $a <=> $b } grep { $_ == $_ } @input;
 
+There is no built in equivalent of the UNIX C<sort -u> command,
+which, given a B<run> of records that compare equal,
+outputs only one representative record.  For simple lexical
+sorting, a hash can be used to eliminate duplicated keys.
+
+    %hash = map { $_ => 1 } @input;
+    @unique = sort keys %hash;
+
+However, this may not work for more complicated comparison functions,
+since equal keys may be produced from distinct records.
+
+    if (@all = sort { comparator($a,$b) } @input) {
+	$representative = shift(@all);
+	$occurrences = 1;
+	while (@all) {
+	    $item = shift(@all);
+	    if (comparator($representative, $item)) {
+		push(@unique, $representative);
+		$representative = $item;
+		$occurrences = 1;
+	    } else {
+		++$occurrences;
+		$representative = $item
+		   if (rand() * $occurrences >= 1);
+	    }
+	}
+	push(@unique, $representative);
+    }
+
+The code above will choose a representative at random from
+each run of duplicated elements.
+If you are happy to take the first or last duplicated element
+as the representative, you need not keep track of occurrences.
+Just eliminate the C<else> entirely to select the first,
+or make the assignment in the C<else> unconditional to select
+the last.
+
 =item splice ARRAY,OFFSET,LENGTH,LIST
 
 =item splice ARRAY,OFFSET,LENGTH



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About