develooper Front page | perl.beginners | Postings from February 2009

processing large datafiles

Thread Next
From:
Pedro Soto
Date:
February 17, 2009 09:07
Subject:
processing large datafiles
Message ID:
381d8e990902170906k5cafc398h4e0a28ff1d6a9229@mail.gmail.com
Dear all,
I need to read a huge file and then write only the columns that match
with ids from another file (with less ids) in a sorted fashion.
I made a script thatdoes the work but it takes a lot of time. I tried
the script with few columns from the huge and it took 5 sec to do the
job. Because I have over 403 000 ids, I calculated more and less 3hr
to run the complete files, but the script is taking longer than that.
I wonder if someone has a better way to do this... I really need to
write the huge file by sorted ids. Any help will be greatly
appreciated
Here is the code:

#!usr/local/bin/perl/
use warnings;
use strict;

open(MAP,"file.map") || die;
my %map;
my %locus;

while(<MAP>) {
chomp;
my @snp =split /\s+/;
if ($snp[0] =~ /Chromosome/) {next};
push(@{$map{$snp[0]}},$snp[3]);
$locus{$snp[3]} = $snp[2];
}
close MAP;

open(IN,"trialped.csv") || die;
my @AoA =();
while(<IN>) {
chomp;
my @temp =split/,/;
push(@AoA,[@temp]);
}
close IN;

$out1= "outfile.txt";

open(OUT1,">$out1") || die;
for (my $x=1;$x<=$#AoA;$x++) {
print OUT1 "$x $AoA[$x][0] 0 0 0 1\t";
foreach my $k (sort {$a <=>$b} keys%map) {
 foreach my $val(sort {$a <=>$b} @{$map{$k}}){
     for (my $y=1;$y <$sca;$y++) {
     if($locus{$val} eq $AoA[0][$y]) {
       print "$AoA[$x][$y]";
      last;
      }
}
}
}
print OUT1 "\n";
}

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About