develooper Front page | perl.fwp | Postings from July 2003

Re: Is this fun?

Thread Previous | Thread Next
From:
A. Pagaltzis
Date:
July 15, 2003 08:21
Subject:
Re: Is this fun?
Message ID:
20030715152034.GA3230@klangraum
* Keith C. Ivey <kcivey@cpcug.org> [2003-07-15 14:42]:
> which will be handled by the regex but may cause a parser to
> blow up (though some are more tolerant than others)

Did you read what I said? You need a tolerant parser indeed. Did
you take any look HTML::Parser at all?

> | That leaves input data munging, which I do a lot of, and a
> | lot of input data these days is XML. Now here's the dirty
> | secret; most of it is machine-generated XML,

Is yours?

> | I've even gone to the length of writing a prefilter to glue
> | together tags that got split across multiple lines, just so I
> | could do the regexp trick.

Do you?

Sure, you as long as you know your input follows narrower
specifications then "arbitrary valid markup", you can use that
knowledge to your advantage.

The deficiencies with parsers are their interfaces; what we
really need is a generic matching engine that can be applied to
ordered collections not only of characters, but of arbitrary
objects for some, so that we could apply a pattern to, say, a
stream of XML parser events.

-- 
Regards,
Aristotle

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About