develooper Front page | perl.beginners | Postings from February 2002

Scan data for XML invalid characters and parse articles

Thread Next
From:
John
Date:
February 13, 2002 08:45
Subject:
Scan data for XML invalid characters and parse articles
Message ID:
jUsT.aNoTheR.mEsSaGe.iD.101361873329862@jpw3.com
I have a scalar variable containing HTML that needs to be converted 
to XML.  It's not the best HTML so it has invalid characters (like 
smart quotes, 1/2 character, etc.).  I need to determine if these 
characters exist in the data and throw an error if they do.  What 
is the best way to do this?  I can't use an XML parser because it's 
not really XML.

Also, if I have a block of text like this:

<!-- begin article1 title -->title1<!-- end article1 -->
<!-- begin article1 body -->body1<!-- end article1 body -->
...
<!-- begin articleN title -->titleN<!-- end articleN title>
<!-- begin articleN body -->bodyN<!-- end articleN body -->

Where the ... means there could be some number of articles (less 
than 5), can anyone think of a relatively simple regex (I mean I 
don't want to have article1, article2, etc. hard-coded in the regex) 
that will extract the titles and bodies?

TIA,

   -John









Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About