Sorry for answering this very basic mail last--it ended up at the bottom in thread view. On Mon, Feb 19, 2001 at 07:01:29PM -0600, Jarkko Hietaniemi wrote: > I drafted the big \xHH vs \x{HH} vs chr vs v vs .. table, and (I > thought) all I talked about was the internal representation, how > many bytes are generated where. You can see that I (with an honest reading of that message, and the ones preceeding it) didn't understand it this way. On one hand, this probably points to the need for clarification. On the other, sorry for any confusion. > There were *internal* inconsistencies and I straightened them out. > I do agree that _for_the_normal_user_ whether chr(128) is > internally one or two bytes long shouldn't matter: and guess what, > currently chr(128) eq "\x80" && chr(128) eq "\x{80}" && chr(128) > eq v128 && chr(128) eq pack("C", 128). So what's the problem we > are trying to solve here? We are trying to answer, what gets printed when I output these guys to a UTF8 handle? What are the results of isalpha, toupper, and the Unicode property regexp patterns? In short, we are trying to identify which character the darn thing is. AndrewThread Previous | Thread Next