develooper Front page | perl.fwp | Postings from March 2002

longest common substring (return of)

Thread Next
From:
Ted Zlatanov
Date:
March 13, 2002 07:42
Subject:
longest common substring (return of)
Message ID:
m3it7zqhvu.fsf@heechee.beld.net
I remember a lengthy LCS discussion, and the solutions for the most
part used the regex engine.  I also looked on CPAN, but couldn't find
a canonical LCS module.

Has anyone developed an implementation of the well-known bitstring
algorithm?  Basically you convert your data strings to bitstrings, AND
the two, and look for the longest match.  Then you rotate the shorter
data string by one, and repeat the test.  Repeat for the number of
bits in the shorter data string.  I want to do it, but just wanted to
check if it had come up before on FWP.

There are even better algorithms to do this.  I found them with
Google; for instance the Goeman & Clausen 1999 "A New Practical Linear
Space Algorithm for the LCS Problem" paper looks interesting but is
beyond what I remember of my college math.  It describes a 
O(ns + min(mp, (m log(m)) + p(n-p))) time, linear space algorithm and a
O(ns + min(mp, p(n-p))) time, O(ns) space algorithm, and claims the
latter to be the fastest known such algorithm (m = size of string A, 
n = size of string B, n>= m, s = alphabet size, p = length of LCS).
I'm pretty sure implementations of the Goeman & Clausen algorithms
haven't been done in Perl yet.

-- 
Teodor Zlatanov <tzz@iglou.com>
"Brevis oratio penetrat colos, longa potatio evacuat ciphos." -Rabelais


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About