develooper Front page | perl.unicode | Postings from June 2010

Variation In Decoding Between Encode and XML::LibXML

Thread Next
David E. Wheeler
June 15, 2010 22:55
Variation In Decoding Between Encode and XML::LibXML
Message ID:
Fellow Perlers,

I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that appears to mangle an originating Flickr feed. But the curious thing is, when I pull the offending string out of the RSS and just stick it in a script, Encode knows how to decode it properly, while XML::LibXML (and my Unicode-aware editors) cannot.

The attached script demonstrates. $str has the bogus-looking character". Encode, however, seems to properly convert it to the "č" in "Laurinavičius" in the output. XML::LibXML, OTOH, outputs it as "Laurinavičius" -- that is, broken. (If things look truly borked in this email too, please look at the attached script.)

So my question is, what gives? Is this truly a broken representation of the character and Encode just figures that out and fixes it? Or is there something off with my editor and with XML::LibXML.

FWIW, the character looks correct in my editor when I load it from the original Flickr feed. It's only after processing by Yahoo! Pipes that it comes out looking mangled.

Any insights would be appreciated.



Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About