develooper Front page | perl.beginners | Postings from March 2002

Fuzzy Matching

March 21, 2002 01:32
Fuzzy Matching
Message ID:
I am attempting to do a "fuzzy match" with the String::Approx (v.3) module,
with very limited success.
I am working with biological genome sequence, this is a 30136242 character
long string (which I load into $seq), each character is either an A , T , G
or C (or in some cases more rarely an N to denote that it could be A,T,G or
C). I then want to match 15 - 20 characters against this 30136242 character

I have written the code below, however I am having problems as the code
seems to stop generally after finding only one hit when I know there are
more in there. The aindex and aslice methods do not seem to have a offset,
so I am having to try to alter the search string myself, to progress along
it. From the documentation I expected aslice to return a two element list
which would be placed into $index and $size, however I seem to get an array
reference returned into $index and $size is left undefined.
Any help / advice on this would be greatly appreciated.


#!/usr/bin/perl -w
use String::Approx qw(amatch aindex aslice); #Fuzzy matching

die "Syntax: primerSearch Chromosome_number, Number_Point_mutations,
Primer_Sequence" if (@ARGV != 3);

open (CHR,"<chromo$ARGV[0]_pseudo_v080501.seq");
$seq = <CHR>;
close (CHR);

$a = $ARGV[2];
# Reverse sequence
my ($ra) =&rev($a);

my $addf = 0;
my $indx;
my $flag;

do {
undef $indx;
undef $flag;
  my ($index,  $size)  = aslice($a, ["$ARGV[1]"], $seq);

  while ( $indx = shift(@$index)) {
    $flag = 1;
    my $sizx = shift(@$index);
    my $sq = substr($seq,$indx,$sizx);
    print ("\t" , $indx+$addf , "\t($sizx)\tSeq: $sq\n");
    $addf += ($indx + 1);
    $seq = substr($seq,$indx,length($seq));

} while ( defined $flag );

sub rev {
  my $reversed_seq = reverse $_[0];
  $reversed_seq =~ tr/ATGC/TACG/;
  return $reversed_seq;
} Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About