develooper Front page | perl.beginners | Postings from February 2002

Scan data for XML invalid characters and parse articles

Thread Next
February 13, 2002 08:45
Scan data for XML invalid characters and parse articles
Message ID:
I have a scalar variable containing HTML that needs to be converted 
to XML.  It's not the best HTML so it has invalid characters (like 
smart quotes, 1/2 character, etc.).  I need to determine if these 
characters exist in the data and throw an error if they do.  What 
is the best way to do this?  I can't use an XML parser because it's 
not really XML.

Also, if I have a block of text like this:

<!-- begin article1 title -->title1<!-- end article1 -->
<!-- begin article1 body -->body1<!-- end article1 body -->
<!-- begin articleN title -->titleN<!-- end articleN title>
<!-- begin articleN body -->bodyN<!-- end articleN body -->

Where the ... means there could be some number of articles (less 
than 5), can anyone think of a relatively simple regex (I mean I 
don't want to have article1, article2, etc. hard-coded in the regex) 
that will extract the titles and bodies?



Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About