develooper Front page | perl.beginners | Postings from February 2002

regex to parse HTML files

Thread Next
From:
sachin balsekar
Date:
February 26, 2002 06:17
Subject:
regex to parse HTML files
Message ID:
3C7B9C43.3000200@myiris.com
Hi ppl,

I have one HTML file per News story...i got to fetch some data (first
few lines) out from a HTML file and display it as an abstract for the
said story...

The HTML file have the following issues...

1. There could be a HTML table at the very beginning..(can i strip out
the whole table..i mean <TABLE **** </TABLE> ...but may cause probs in
nested tables...(trying regex for the same)...

2. I need to pick up first 1/2 lines..( i look for a '.' and pickup
data) but fails for acronyms/ numbers.. (Ltd. or 5.8% etc)

These two issues solved could get the almost thru with the prob..

Please help...

Regs,
sgb




-- 
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org



Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About