develooper Front page | perl.perl5.porters | Postings from July 2001

Regexp-based XML parser (XML::Parser::Lite)

Thread Next
From:
Paul Kulchenko
Date:
July 31, 2001 18:22
Subject:
Regexp-based XML parser (XML::Parser::Lite)
Message ID:
20010801012207.3411.qmail@web13504.mail.yahoo.com
Hi, Jarkko!

Short summary. To implement pure XML parser several options are
available: 
1. shallow parser (regexp-based, but doesn't use ?{} or /e)
2. regexp-based (XML::Parser::Lite or similar)
3. grammar-based parser
4. ?

Shallow parser returns stream of tokens, but for generating events
from this stream second maching is required.
Grammar-based parser can be implemented with Parse::RecDescent, but
in the near future Parse::RecDescent won't be included in the core.
Regexp-based parser that uses ?{} has problems with regexpes invoke
from inside generated callbacks.

Here are the results of experiments with ?{}. Minimal code that
fails:

use re 'eval';
1 while 121211222=~/(1)(?{callback($1)})(?:(3)(?{callback($2)}))?/g;

sub callback {
  my $c = $_[0];
#  ;             # 0 # default
#  $_[0] =~ /1/; # 1 # in-place manipulations
#  $c =~ s/1/3/; # 2 # successful s///
#  1 =~ /1/;     # 3 # successful match
#  1 =~ /(1)/;   # 4 # successful with $1, localization doesn't help
  print $c;
}

It does NOT fail if there is no '?' in the end of regexp (or if there
is no second section). It also does NOT fail if internal regexp isn't
match.

Results are below, but what I especially don't like is endless output
of '1's in tests 1 and 3 (both executed correctly by all previous
versions except 5.005) and coredumps of 5.7.x in tests 2 and 4 (yet
results of others are also incorrect, but they are not coredump). All
experiments done on Linux and Windows, see if you can reproduce it in
your environment.

Results:

# 0

perl5.00503
111
perl5.00503 (Windows, ActiveState)
111
perl5.6.0
1111
perl5.6.0 (Windows)
1111
perl5.6.1
1111
perl5.6.1 (Windows, ActiveState)
1111
perl5.7.1
1111
perl5.7.2
1111

# 1

perl5.00503
111
perl5.00503 (Windows, ActiveState)
111
perl5.6.0
1111
perl5.6.0 (Windows)
1111
perl5.6.1
1111
perl5.6.1 (Windows, ActiveState)
1111
perl5.7.1
1111
perl5.7.2
*endless output of '1'

# 2

perl5.00503
333
perl5.00503 (Windows, ActiveState)
333
perl5.6.0
*coredump
perl5.6.0 (Windows)
3*coredump
perl5.6.1
3
perl5.6.1 (Windows, ActiveState)
3
perl5.7.1
*coredump
perl5.7.2
*coredump

# 3

perl5.00503
111
perl5.00503 (Windows, ActiveState)
111
perl5.6.0
1111
perl5.6.0 (Windows)
1111
perl5.6.1
1111
perl5.6.1 (Windows, ActiveState)
1111
perl5.7.1
1111
perl5.7.2
*endless output of '1'

# 4

perl5.00503
111
perl5.00503 (Windows, ActiveState)
*nothing
perl5.6.0
1
perl5.6.0 (Windows)
1*coredump
perl5.6.1
1
perl5.6.1 (Windows, ActiveState)
1
perl5.7.1
*coredump
perl5.7.2
*coredump

Best wishes, Paul.



__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About