develooper Front page | perl.libwww | Postings from January 2002

Parsing Directory Listings

Chris Martino
January 15, 2002 11:02
Parsing Directory Listings
Message ID:

I'm writing a script to login to a secure website using basic http
authorization.  Once logged in I get a directory listing of files w/ their
time/date stamps, etc.  What I'm trying to do is filter out the time stamp
and file name so I can match that against the current date/time.  There
are multiple files to be checked.

I've gotten LWP compiled w/ the Crypt::SSLeay module so I can access
secure http sites, which works as advertised.  I've gotten the script
written which logs into the site using basic auth.  Now, I get a bunch of
html jibberish, like so:

[staging@dev2 bin]$ ./chk_from_allfirst
<head><title> -
/transfer/Company/</title></head><body><H1> -

<pre><A HREF="/transfer/">[To Parent Directory]</A><br><br>   1/15/02
2:00 AM        23520 <A
HREF="/transfer/Company/CARDS.TXT">CARDS.TXT</A><br>  10/23/00  9:52 AM
67 <A HREF="/transfer/Company/error-1016.txt">error-1016.txt</A><br>
1/15/02  2:00 AM      1335656 <A
HREF="/transfer/Company/EXCPTFLE.TXT">EXCPTFLE.TXT</A><br>   11/1/97  8:18
AM         3285 <A HREF="/transfer/Company/legal.htm">legal.htm</A><br>
1/14/02  6:53 PM          136 <A
HREF="/transfer/Company/NEWCARDS.TXT">NEWCARDS.TXT</A><br>   1/15/02  2:00
AM       277458 <A HREF="/transfer/Company/POSTED.TXT">POSTED.TXT</A><br>
1/15/02  2:01 AM        43200 <A

All I need is the first two colums (date & time), and the last column
(file name).

How can I strip the html tags out and show only the info I need?  I've
tried using HTML::TreeBuilder and HTML::Parser to no avail.

Can someone please point me in the right direction? Examples would be
welcome too. ;)

Chris Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About