develooper Front page | perl.beginners | Postings from March 2002

Random Sampling in Perl

From:
Balint, Jess
Date:
March 18, 2002 14:10
Subject:
Random Sampling in Perl
Message ID:
CA8ED43817EFD211855E00805FE6FD7403AA1AD7@scadmail2.alldata.net
Hello all, I have a file of 3,210,008 CSV records. I need to take a random
sample of this. I tried hacking something together a while ago, but it
seemed to repeat 65,536 different records. When I need a 5mil sample, this
creates a problem.

Here is my old code: I know the logic allows dups, but what would incur the
limit? I think with 500,000 samples there wouldn't be a problem getting more
than 65536 diff records, but that number is too ironic for me to deal with.
Thanks.

#!/usr/local/bin/perl -w

open (FILE,"consumer.sample.sasdump.txt");
open (NEW,">consumer.new");

@data = <FILE>;

for ( $jess == 1; $jess < 500000; $jess++ ) {
	$index = rand @data;
	print NEW $data[$index];
}

close(FILE);
close(NEW);



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About