develooper Front page | perl.libwww | Postings from January 2002

Re: simple robot?

Thread Previous
Robert Barta
January 6, 2002 14:43
Re: simple robot?
Message ID:
On Sun, Dec 30, 2001 at 07:05:03PM +0100, Matej Kovacic wrote:
> I have a question... does anyone have - and is willing to give - the program
> which get URL of a website as an input parameter, and then build a tree or
> list all HTML files within that site.

Yes, I have written a module I tentatively called WWW::Analyze. It is subclassing
WWW::Robot. It never made it into CPAN because it is fairly undocumented/wrongly
documented and has no decent test suite. Yet. Volunteers welcome.

       WWW::Analyze - Perl extension for web site analysis

         use WWW::Analyze;

         $a = new WWW::Analyze ();

         $a->run ('');

       The WWW::Analyzer is a specialized robot to analyze a
       given web site for inconsistencies but also general
       statistics (the STATISTICS manpage). When started, the
       robot will iterate over the site (with a given set of preĀ­
       defined policies) and will gather information about the
       pages and the site structure. This structure can then be

I have uploaded it to

> If possibly to the arbitrary depth.

Yes, it can do that, although the collected data gets pretty big fast.
As this was written to check student web pages for plagiarism there is
even an option to check every page against Google and find similarities
with LCS. Perl/CPAN is simply amazing.


PS: I'm open to suggestions for a better name!

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About