develooper Front page | perl.perl5.porters | Postings from February 2008

UTF8 problem with Perl 5.10.0

Thread Next
From:
Phil Harvey
Date:
February 21, 2008 05:48
Subject:
UTF8 problem with Perl 5.10.0
Message ID:
1A02C113-AA18-4DC9-802D-74326CE48ADC@owl.phy.queensu.ca
I am trying to convert a series of bytes that I know to be UTF8 to  
obtain the numerical codepoints for each character (if this makes  
sense).  In previous versions of Perl (back to 5.6.1), this was the  
behaviour:

 > perl -e 'print unpack("H*", pack("n*",unpack("U0U*","\xc3\xb6")))'
00f6

Which is what I expected, and what I require.

But in Perl 5.10.0, this happens:

 > perl-5.10.0 -e 'print unpack("H*", pack("n*",unpack("U0U*","\xc3 
\xb6")))'
00c300b6

Which obviously hasn't interpreted the string as UTF8.

Needless to say, this change in behaviour is rather distressing.  How  
can I change my unpack call so that this works again for all versions  
of Perl (>=5.6.1)?

TIA for any help you can provide.

	- Phil


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About