develooper Front page | perl.beginners | Postings from March 2002

Re: Books on advanced text processing

Thread Previous
Peter Scott
March 30, 2002 08:36
Re: Books on advanced text processing
Message ID:
At 11:36 PM 3/29/02 -0500, Jim Witte wrote:
>   I'm contemplating writing some software to scan through a large volume 
> of email (over 95 MB) to identify threads and remove quoted material.
>Does anyone have any good references on algorithms to do text processing 
>like this for such a massive amount of data?

Is this something you're planning on doing once, or many times?  95MB is 
nothing; right now I'm scanning through several hundred gigabytes of 
text.  Do you need sub-second response on this?  If not, I don't see the 
need for advanced algorithms.

Peter Scott
Pacific Systems Design Technologies

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About