develooper Front page | perl.unicode | Postings from June 2010

Variation In Decoding Between Encode and XML::LibXML

Thread Next
From:
David E. Wheeler
Date:
June 15, 2010 22:55
Subject:
Variation In Decoding Between Encode and XML::LibXML
Message ID:
4ED1299B-6B4F-42B3-A100-9D4165654EA7@kineticode.com
Fellow Perlers,

I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that appears to mangle an originating Flickr feed. But the curious thing is, when I pull the offending string out of the RSS and just stick it in a script, Encode knows how to decode it properly, while XML::LibXML (and my Unicode-aware editors) cannot.

The attached script demonstrates. $str has the bogus-looking character". Encode, however, seems to properly convert it to the "č" in "Laurinavičius" in the output. XML::LibXML, OTOH, outputs it as "Laurinavičius" -- that is, broken. (If things look truly borked in this email too, please look at the attached script.)

So my question is, what gives? Is this truly a broken representation of the character and Encode just figures that out and fixes it? Or is there something off with my editor and with XML::LibXML.

FWIW, the character looks correct in my editor when I load it from the original Flickr feed. It's only after processing by Yahoo! Pipes that it comes out looking mangled.

Any insights would be appreciated.

Best,

David



Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About