develooper Front page | perl.libwww | Postings from August 2016

Re: Facing problem with HTML::Parser

Thread Previous | Thread Next
From:
Shlomi Fish
Date:
August 13, 2016 08:45
Subject:
Re: Facing problem with HTML::Parser
Message ID:
20160813114509.6b1deaa1@telaviv1.shlomifish.org
Hi Shivani,

please reply to all recipients.

Some comments on your code:

On Thu, 11 Aug 2016 11:14:44 +0530
Shivani Palle <palleshivani@gmail.com> wrote:

> Hi,
> 
> 
> I am facing one issue while using HTML::Parser. Please help me.
>

first of all note that HTML::Parser should normally not be used directly and
instead one should use one of its subclasses such as HTML::TreeBuilder. For
more information, see:

http://perl-begin.org/uses/text-parsing/
 
> *Issue:*
> 
> I am using HTML::Parser to parse all the HTML files through out the
> directories to get hard coded strings from the html files(text between the
> tags).
> 
> the code is like this:
> 
>  #!/usr/bin/perl -w

Don't use the -w flag, see:

http://perl-begin.org/tutorials/bad-elements/#the-dash-w-flag

> package Example;

1. You're missing "use strict;" and "use warnings;".

2. Better not call the package "Example".

> require HTML::Parser;
> @Example::ISA = qw(HTML::Parser);

This should be:

	use parent 'HTML::Parser';

> use File::Find;
> use File::Basename;
> 
> #my @files = glob("*.thtml");
> find({ wanted => \&process_file, no_chdir => 1 },
> "/mnt/src/xxx/git/xxx-ive-rdv/");
> 
> #foreach $file (@files){
> sub process_file {
>    if (-f $_) {
>        if ($_ =~ m/(.thtml)$/i) {
>    #my($file, $dir, $ext) = fileparse($_);
>    my $file = $_;

$_ can be clobbered and devastated very easily so in this case should be
assigned to a lexical variable as soon as possible.

Furthermore, don't call variables "file":

http://perl-begin.org/tutorials/bad-elements/#calling-variables-file

>     #step1: Parsing the html file and storing the parsed content in another
> file
>     my $parser = Example->new;
>     $parser->ignore_elements(qw(script)); #ignoring script elements
>     $parser->parse_file($file);
>     print  $parser->{TEXT};
> 
>     sub text
>     {
>         my ($self,$text) = @_;
>         $self->{TEXT} .= $text."\n";
>     }

Don't declare functions inside functions like that:

http://perl-begin.org/tutorials/bad-elements/#nested_top_level_subroutines

>     open(my $fh, '>', 'parserOutput.txt');

Either "use autodie;" or add an "or die " error. See:

http://perl-begin.org/tutorials/bad-elements/#open-function-style

Furthermore, '>' is overwrite and will overwrite the output for every file.
Maybe you want '>>' (append) instead.

>     print $fh  $parser->{TEXT};

You should wrap the $fh in curly braces:

http://perl-begin.org/tutorials/bad-elements/#print_to_fh

Perhaps use Path::Tiny or a similar file I/O abstraction.

-----------------------

I should note that on online Perl forums we've been contacted for help by
many people who have given incredibly sub-optimal Perl code that they wrote
violating the knowledgeable community's best practices and recommendations.
Can you tell us how you learned Perl and what led you to these learning
resources? 

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish

Flock aims to be the browser for the social web, but I found it to be the
completely anti-social browser.

Please reply to the list if it's a mailing list post

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About