develooper Front page | perl.perl5.porters | Postings from August 2001

Re: Regexp-based XML parser (XML::Parser::Lite)

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
August 15, 2001 20:21
Subject:
Re: Regexp-based XML parser (XML::Parser::Lite)
Message ID:
20010815222122.A10151@chaos.wustl.edu
On Tue, Jul 31, 2001 at 06:22:07PM -0700, Paul Kulchenko wrote:
> Hi, Jarkko!
> 
> Short summary. To implement pure XML parser several options are
> available: 
> 1. shallow parser (regexp-based, but doesn't use ?{} or /e)
> 2. regexp-based (XML::Parser::Lite or similar)
> 3. grammar-based parser
> 4. ?
> 
> Shallow parser returns stream of tokens, but for generating events
> from this stream second maching is required.
> Grammar-based parser can be implemented with Parse::RecDescent, but
> in the near future Parse::RecDescent won't be included in the core.

Not just that.  Damian's opinion is that P::R should *never* be
included in the core.  Not just because size or what-to-include
concerns, but because Damian basically wants to rewrite the whole
thing from scratch.

> Regexp-based parser that uses ?{} has problems with regexpes invoke
> from inside generated callbacks.

I suggest scrapping the use of ?{}: it's too fragile (as you found
out, painfully).  See the attached quick hack (which I disavow) that
uses more vanilla regexps and simple recursion.

> Here are the results of experiments with ?{}. Minimal code that
> fails:
> 
> use re 'eval';
> 1 while 121211222=~/(1)(?{callback($1)})(?:(3)(?{callback($2)}))?/g;
> 
> sub callback {
>   my $c = $_[0];
> #  ;             # 0 # default
> #  $_[0] =~ /1/; # 1 # in-place manipulations
> #  $c =~ s/1/3/; # 2 # successful s///
> #  1 =~ /1/;     # 3 # successful match
> #  1 =~ /(1)/;   # 4 # successful with $1, localization doesn't help
>   print $c;
> }
> 
> It does NOT fail if there is no '?' in the end of regexp (or if there
> is no second section). It also does NOT fail if internal regexp isn't
> match.
> 
> Results are below, but what I especially don't like is endless output
> of '1's in tests 1 and 3 (both executed correctly by all previous
> versions except 5.005) and coredumps of 5.7.x in tests 2 and 4 (yet
> results of others are also incorrect, but they are not coredump). All
> experiments done on Linux and Windows, see if you can reproduce it in
> your environment.
> 
> Results:
> 
> # 0
> 
> perl5.00503
> 111
> perl5.00503 (Windows, ActiveState)
> 111
> perl5.6.0
> 1111
> perl5.6.0 (Windows)
> 1111
> perl5.6.1
> 1111
> perl5.6.1 (Windows, ActiveState)
> 1111
> perl5.7.1
> 1111
> perl5.7.2
> 1111
> 
> # 1
> 
> perl5.00503
> 111
> perl5.00503 (Windows, ActiveState)
> 111
> perl5.6.0
> 1111
> perl5.6.0 (Windows)
> 1111
> perl5.6.1
> 1111
> perl5.6.1 (Windows, ActiveState)
> 1111
> perl5.7.1
> 1111
> perl5.7.2
> *endless output of '1'
> 
> # 2
> 
> perl5.00503
> 333
> perl5.00503 (Windows, ActiveState)
> 333
> perl5.6.0
> *coredump
> perl5.6.0 (Windows)
> 3*coredump
> perl5.6.1
> 3
> perl5.6.1 (Windows, ActiveState)
> 3
> perl5.7.1
> *coredump
> perl5.7.2
> *coredump
> 
> # 3
> 
> perl5.00503
> 111
> perl5.00503 (Windows, ActiveState)
> 111
> perl5.6.0
> 1111
> perl5.6.0 (Windows)
> 1111
> perl5.6.1
> 1111
> perl5.6.1 (Windows, ActiveState)
> 1111
> perl5.7.1
> 1111
> perl5.7.2
> *endless output of '1'
> 
> # 4
> 
> perl5.00503
> 111
> perl5.00503 (Windows, ActiveState)
> *nothing
> perl5.6.0
> 1
> perl5.6.0 (Windows)
> 1*coredump
> perl5.6.1
> 1
> perl5.6.1 (Windows, ActiveState)
> 1
> perl5.7.1
> *coredump
> perl5.7.2
> *coredump
> 
> Best wishes, Paul.
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Make international calls for as low as $.04/minute with Yahoo! Messenger
> http://phonecard.yahoo.com/

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About