develooper Front page | perl.libwww | Postings from August 2016

Facing problem with HTML::Parser

Thread Next
From:
Shivani Palle
Date:
August 12, 2016 18:47
Subject:
Facing problem with HTML::Parser
Message ID:
CAH0Myt8GNrMXqHB4ge-FnB+qkpzL4K_5SAVx=eBcU=QqaN5hMA@mail.gmail.com
Hi,


I am facing one issue while using HTML::Parser. Please help me.

*Issue:*

I am using HTML::Parser to parse all the HTML files through out the
directories to get hard coded strings from the html files(text between the
tags).

the code is like this:

 #!/usr/bin/perl -w
package Example;
require HTML::Parser;
@Example::ISA = qw(HTML::Parser);
use File::Find;
use File::Basename;

#my @files = glob("*.thtml");
find({ wanted => \&process_file, no_chdir => 1 },
"/mnt/src/xxx/git/xxx-ive-rdv/");

#foreach $file (@files){
sub process_file {
   if (-f $_) {
       if ($_ =~ m/(.thtml)$/i) {
   #my($file, $dir, $ext) = fileparse($_);
   my $file = $_;
    #step1: Parsing the html file and storing the parsed content in another
file
    my $parser = Example->new;
    $parser->ignore_elements(qw(script)); #ignoring script elements
    $parser->parse_file($file);
    print  $parser->{TEXT};

    sub text
    {
        my ($self,$text) = @_;
        $self->{TEXT} .= $text."\n";
    }
    open(my $fh, '>', 'parserOutput.txt');
    print $fh  $parser->{TEXT};
    close $fh;
   }
  }
}



*Failing case*:

It is breaking some lines in to two lines.
For example, I have the following line.

*Before Parsing:*
<label for="chkInstallAgent">Install Agent for this role</label>

*After Parsing*:
Install Agent for this
role

There is no tag in "Install Agent for this role". But still it is breaking
in to two lines.
Can you please help me with it.



Thanks,
Sivani

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About