Front page | perl.beginners |
Postings from April 2012
Re: XML::Mini question
Thread Previous
From:
Manfred Lotz
Date:
April 18, 2012 22:43
Subject:
Re: XML::Mini question
Message ID:
20120419073942.5e1328bd@arcor.com
On Wed, 18 Apr 2012 22:23:37 +0200
Manfred Lotz <manfred.lotz@arcor.de> wrote:
> On Thu, 19 Apr 2012 06:15:47 +1000
> "Owen" <rcook@pcug.org.au> wrote:
>
> >
> > > Hi there,
> > > I've got a question about XML::Mini.
> > >
> > > When parsing an xml document for some reasons I want to preserve
> > > white space. However, it doesn't work really.
> > >
> > > Minimal example:
> > >
> > > ! /usr/bin/perl
> > >
> > >
> > > use strict;
> > > use warnings;
> > > use Data::Dumper;
> > > use XML::Mini::Document;
> > >
> > > my $XMLString = "<book> Learning Perl </book>";
> > >
> > > my $xmlDoc = XML::Mini::Document->new();
> > >
> > > $XML::Mini::IgnoreWhitespaces = 0;
> > >
> > > # init the doc from an XML string
> > > $xmlDoc->parse($XMLString);
> > >
> > > my $xmlHash = $xmlDoc->toHash();
> > >
> > > print Dumper($xmlHash);
> > >
> > >
> > > I get the following output:
> > > VAR1 = {
> > > 'book' => 'Learning Perl '
> > > };
> > >
> > >
> > > I would have expecte to have
> > > book' => ' Learning Perl '
> > >
> > > instead.
> > >
> > >
> > > Any idea, what's going wrong?
> >
> >
> > What Happens if you set $XML::Mini::IgnoreWhitespaces = 1
> >
> > Seems to me that 1 = yes
> >
>
> This is true.
>
> > What does the documentation say?
> >
>
> If I set it to 1 then I get
> book' => 'Learning Perl'
>
> which is even worse. Please note that I don't want to have ignored
> white space.
>
>
Hm, I had no other idea but to look up the source code. I guess I found
what happens.
if ($XMLString =~
m/^\s*(<\s*([^\s>]+)([^>]+)\/\s*>| # <unary \/>
<\?\s*([^\s>]+)\s*([^>]*)\?>| # <? headers ?>
<!--(.+?)-->| # <!-- comments -->
<!\[CDATA\s*\[(.*?)\]\]\s*>\s*| # CDATA
<!DOCTYPE\s*([^\[>]*)(\[.*?\])?\s*>\s*| # DOCTYPE
<!ENTITY\s*([^"'>]+)\s*(["'])([^\11]+)\11\s*>\s*| # ENTITY
([^<]+))(.*)/xogsmi) # plain text
IHMO, here is the bug. Here leading white space will be deleted which
is ok if it is no plaintext.
I changed it like this
if ($XMLString =~
m/(^\s*<\s*([^\s>]+)([^>]+)\/\s*>| #<unary \/>
^\s*<\?\s*([^\s>]+)\s*([^>]*)\?>| # <? headers ?>
^\s*<!--(.+?)-->| # <!-- comments -->
^\s*<!\[CDATA\s*\[(.*?)\]\]\s*>\s*| # CDATA
^\s*<!DOCTYPE\s*([^\[>]*)(\[.*?\])?\s*>\s*| # DOCTYPE
^\s*<!ENTITY\s*([^"'>]+)\s*(["'])([^\11]+)\11\s*>\s*| # ENTITY
([^<]+))(.*)/xogsmi) # plain text
Now in all cases except plain text leading space will be deleted.
$VAR1 = {
'book' => ' Learning Perl '
};
--
Manfred
Thread Previous