develooper Front page | perl.beginners | Postings from January 2002

RE: extracting links.. continued..

Thread Next
From:
Lorne Easton
Date:
January 16, 2002 16:10
Subject:
RE: extracting links.. continued..
Message ID:
20020117001000.72229.qmail@onion.perl.org
Hi there,

Thanks for the advice. I looked at using HTML::LinkExtor but decided against
it.

I am using code like the following:


sub get_urls {

my @url_array;
my ($data) = @_;

print $data;

#Put all "<A HREF links into url_array
while ($data =~ m|(<a href.*</a>)|gi) {

 my  $temp_tag = $1;
#Strip out tags
#Insert code here..

push @url_array,$temp_tag;

}
#Temporary to print out all URLS. Testing purposes only.
foreach my $temp (@url_array){
    print $temp,"\n";
}
print "\n\n",$#url_array," URLs found.\n";
#####################################################################

return(@url_array);
}

Which is cool, but it extracts the entire <A HREF="URL">TEXT</A> text. Is
there a way to modify this regexp to strip out this data as well. Obviously
match (m/) is inclusive if the matched data. Is there any way of modifying
this? Or perhaps writing a regexp to do this??

The problem is that data could be

<a href = "
<a href="

e.t.c...

Perhaps something that grabs all the data in between the quotes would be
useful..

Any ideas would be appreciated..

Cheers,
Lorne




Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About