Front page | perl.beginners |
Postings from March 2002
Fuzzy Matching
From:
paul.beckett
Date:
March 21, 2002 01:32
Subject:
Fuzzy Matching
Message ID:
E4D0A20B9E9ED4118E3C00508BEED171016B3982@jimserv2.jic.bbsrc.ac.uk
I am attempting to do a "fuzzy match" with the String::Approx (v.3) module,
with very limited success.
I am working with biological genome sequence, this is a 30136242 character
long string (which I load into $seq), each character is either an A , T , G
or C (or in some cases more rarely an N to denote that it could be A,T,G or
C). I then want to match 15 - 20 characters against this 30136242 character
string.
I have written the code below, however I am having problems as the code
seems to stop generally after finding only one hit when I know there are
more in there. The aindex and aslice methods do not seem to have a offset,
so I am having to try to alter the search string myself, to progress along
it. From the documentation I expected aslice to return a two element list
which would be placed into $index and $size, however I seem to get an array
reference returned into $index and $size is left undefined.
Any help / advice on this would be greatly appreciated.
Cheers
Paul
#!/usr/bin/perl -w
use String::Approx qw(amatch aindex aslice); #Fuzzy matching
die "Syntax: primerSearch Chromosome_number, Number_Point_mutations,
Primer_Sequence" if (@ARGV != 3);
open (CHR,"<chromo$ARGV[0]_pseudo_v080501.seq");
$seq = <CHR>;
close (CHR);
$a = $ARGV[2];
# Reverse sequence
my ($ra) =&rev($a);
my $addf = 0;
my $indx;
my $flag;
do {
undef $indx;
undef $flag;
my ($index, $size) = aslice($a, ["$ARGV[1]"], $seq);
while ( $indx = shift(@$index)) {
$flag = 1;
my $sizx = shift(@$index);
my $sq = substr($seq,$indx,$sizx);
print ("\t" , $indx+$addf , "\t($sizx)\tSeq: $sq\n");
$addf += ($indx + 1);
$seq = substr($seq,$indx,length($seq));
}
} while ( defined $flag );
sub rev {
my $reversed_seq = reverse $_[0];
$reversed_seq =~ tr/ATGC/TACG/;
return $reversed_seq;
}
-
Fuzzy Matching
by paul.beckett