develooper Front page | perl.unicode | Postings from October 2010

Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

Thread Next
From:
Dan Muey
Date:
October 28, 2010 12:54
Subject:
Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?
Message ID:
DEBC3BEF-958A-4BA1-B56D-339774957FFB@cpanel.net
In other words, is there any character that will make ord() return over  256 when passed in as a byte string?

For example, note the differences in output between a unicode string and a byte string regarding character 257, as a unicode string it is 257, as a byte string it is 196.

$ perl -C6 -le 'print "Character 257 info:";print "\tunicode \\x{} notation: " . sprintf(q{\x{%x}}, 257);print "\tOutput as Unicode string \x{101}";print "\tunicode string \\x{} notation ord(): " . ord("\x{101}");print "\tbyte string grapheme ord(): " . ord "\xc4\x81";print "\tbyte string literal ord(): " . ord "ā";'
Character 257 info:
	unicode \x{} notation: \x{101}
	Output as Unicode string ā
	unicode string \x{} notation ord(): 257
	byte string grapheme ord(): 196
	byte string literal ord(): 196
$

The reason this is relevant is that on a given project I am using byte-strings-only for consistency and some encoders (i.e. Scalar::Quote::Q() )will change from bytes-string-friendly-grapheme-cluster notation (e.g. \xE3\x8A\xB7)  to unicode-string-notation (e.g. \x{32B7}) and I want to be sure I always use data that gets me  the former rather than the latter :)

TIA!

--
Dan Muey
Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About