develooper Front page | perl.beginners | Postings from June 2012

Sluggish code

Thread Next
From:
venkates
Date:
June 11, 2012 07:31
Subject:
Sluggish code
Message ID:
4FD6013D.2080201@nt.ntnu.no
Hi all,

I am trying to filter files from a directory (code provided below)  by 
comparing the contents of each file with a hash ref (a parsed id map 
file provided as an argument). The code is working however, is extremely 
slow.  The .csv files (81 files) that I am reading are not very large 
(largest file is 183,258 bytes).  I would appreciate if you could 
suggest improvements to the code.

sub filter {
     my ( $pazar_dir_path, $up_map, $output ) = @_;
     croak "Not enough arguments! " if ( @_ < 3 );

     my $accepted = 0;
     my $rejected = 0;

     opendir DH, $pazar_dir_path or croak ("Error in opening directory 
'$pazar_dir_path': $!");
     open my $OUT, '>', $output or croak ("Cannot open file for writing 
'$output': $!");
     while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
         my @records;
         foreach my $file ( @data_files ) {
             open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot 
open file '$file': $!");
             while ( my $data = <$FH> ) {
                 chomp $data;
                 my $record_output;
                 @records = split /\t/, $data;
                 foreach my $up_acs ( keys %{$up_map} ) {
                     foreach my $ensemble_id ( 
@{$up_map->{$up_acs}{'Ensembl_TRS'}} ){
                         if ( $records[1] eq $ensemble_id ) {
                             $record_output = join( "\t", @records );
                             print $OUT "$record_output\n";
                             $accepted++;
                         }
                         else {
                             $rejected++;
                             next;
                         }
                     }
                 }
             }
             close $FH;
         }
     }
     close $OUT;
     closedir (DH);
     print "accepted records: $accepted\n, rejected records: $rejected\n";
     return $output;
}

__DATA__

TF0000210    ENSMUST00000001326    SP1_MOUSE    GS0000422    
ENSMUSG00000037974    7    148974877    149005136    Mus musculus    
MUC5AC    14570593    ELECTROPHORETIC MOBILITY SHIFT ASSAY 
(EMSA)::SUPERSHIFT
TF0000211    ENSMUST00000066003    SP3_MOUSE    GS0000422    
ENSMUSG00000037974    7    148974877    149005136    Mus musculus    
MUC5AC    14570593    ELECTROPHORETIC MOBILITY SHIFT ASSAY 
(EMSA)::SUPERSHIFT


Thanks a lot,

Aravind

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About