develooper Front page | perl.ithreads | Postings from December 2008

No gain in speed with threads

Thread Next
From:
Blanchette, Marco
Date:
December 31, 2008 06:33
Subject:
No gain in speed with threads
Message ID:
C580DEBB.C184%MAB@Stowers-Institute.org
Dear all,

I am trying to speed up a very long procedure that I need to run on multiple files and though that I could multithread different jobs on different files across multiple CPUs. For some reason that I don't really get, I only achieve very small time gain. I have included my script which essentially repeat the same function, extractSeq() on multiple files using a maximum of four threads.

I would really appreciate if I could finally understand how to use threads to speed up some of my lengthy scripts.

Thanks

Marco

#!/usr/local/bin/perl -w

use strict;
use Bio::SeqIO;
use threads;
use Getopt::Std;

our $opt_p;

init();
my @thr;
for (my $i=0;$i<=$#ARGV;$i++){
  push @thr, threads->new(\&extractSeq, $ARGV[$i]);
  if (scalar(@thr) == $opt_p || $i == $#ARGV){
    print "Running ",scalar(@thr)," parallel jobs\n";
    $_->join for @thr;
    undef @thr;
  }
}

sub extractSeq {
  my $file=shift;

  my ($dir,$pre,$suf) = ($file=~/(^.+\/|^)(.+)\.(.+$)/);
  my $out_name = "$pre"."_CleanSeq.$suf";

  my $seqin = Bio::SeqIO->new(-file => $file,
             -format =>'fasta');

  my $seq_out = Bio::SeqIO->new(-file => ">$out_name",
                  -format => 'fasta');

  while (my $seq = $seqin->next_seq){
    if ($seq->seq =~ /AGATC/){
      $seq->seq($seq->subseq(1,$-[0]+5));

      $seq_out->write_seq($seq);
    }
  }
    return(0);
}


sub init {
  getopts("p:");
  unless (@ARGV) {
    print("extractseq.pl [-p 4] seq_1.fa [seq_2.fa ...]\n\n",
      "Take the sequences from the Solexa sequences in Fasta format and\n",
      "\t1)Find the B primer\n",
      "\t2)Extract the sequences before the B primer leaving 5 nt of B primer\n\n",
      "-p\tNumber of processors to be used to process the files when more than one files are passed to the command line\n",
      "\tDefault 4\n\n");
    exit(1);
  }
  $opt_p = 4 unless $opt_p;
  return(0);
}

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About