develooper Front page | perl.beginners | Postings from August 2011

Re: Spidering

Thread Previous | Thread Next
From:
C.DeRykus
Date:
August 3, 2011 03:10
Subject:
Re: Spidering
Message ID:
dc049fdb-b52a-401b-821e-6c420abf0734@r5g2000prf.googlegroups.com
On Aug 1, 10:51 am, rob.di...@gmx.com (Rob Dixon) wrote:
> On 01/08/2011 11:03, VinoRex.E wrote:
>
>
>
> > Hi everyone i am a  beginer for Perl can you give me a psedocode and a
> > sample code for a spider program.It will be helpful in understanding web
> > interfaces.Thank you
>
> If you can't write your own pseudocode for a web spider then check
> Bharathiar University for a more appropriate course. One version goes
>
>    function fetchall(URL)
>      content = get(URL)
>      loop for it over findlinks(content)
>        content = content + fetchall(it)
>      return content
>    end
>
> Since the purpose of your efforts is to learn Perl, I think a module
> like WWW::Mechanize is the wrong choice. To write a program that
> accesses the internet, you should install and study the LWP library.

 LWP::RobotUA can be used in conjunction with other modules
 in the LWP library suite too. It'll provide methods to ensure
appropriate spidering behavior, ie, not hitting sites too fast and
heeding a site's 'robots.txt' guidelines. This is very important for
any spidering programs you write.

--
Charles DeRykus


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About