On 11/04/2011 06:43, Shlomi Fish wrote:
> On Sunday 10 Apr 2011 14:05:49 cityuk wrote:
>>
>> This is more of a generic question on regular expressions as my
>> program is working fine but I was just curious.
>>
>> Say you have the following URLs:
>>
>> http://www.test.com/image.gif
>> http://www.test.com/?src=image.gif?width=12
>>
>
> Don't use regular expressions to parse URLs - instead use URI.pm:
>
> http://cpan.uwinnipeg.ca/dist/URI
I agree. The program below shows a subroutine which will extract the
file type from either form of URL. It first checks to see if there is a
'src' option in the query, using this for the file name if so; otherwise
it uses the last segment of the URL path. The file type type is
extracted by capturing all trailing non-dot characters from the file
name.
(I assume your second address should read
<http://www.test.com/?src=image.gif&width=12> with an ampersand instead
of a second question mark?)
HTH,
Rob
use strict;
use warnings;
use URI;
sub filetype_from_url {
my $url = URI->new($_[0]);
my %form = $url->query_form;
my $file = $form{src} || ($url->path_segments)[-1];
return $file =~ /([^.]+)\z/;
}
print filetype_from_url('http://www.test.com/image.gif'), "\n";
print filetype_from_url('http://www.test.com/?src=image.gif&width=12'), "\n";
Thread Previous
|
Thread Next