On Sun, Feb 04, 2007 at 08:43:15PM +0900, SADAHIRO Tomoyuki wrote: > > > > At least his idea should not work on EBCDIC platforms like IBM z/OS. > > > > I choose to make UTF-8 the encoding used for strings (some people would say > > this is the internal encoding and thus should not matter). > > Support for EBCDIC would be in the form that input/output will be converted to EBCDIC. > > There are many parts of perl internal code that assume > the unicode encoding should have same octet representations as those > of the native encoding (ASCII to UTF-8 or EBCDIC to UTF-EBCDIC). > For example '\n' in C on EBCDIC platforms is LF in UTF-EBCDIC as well, > that is the internal assumption, while that is not LF in UTF-8. > Your idea requires such conversion at all parts, not only codes for > executions but also the parser and the lexer. > Just input/output conversion must not be enough. You convinced me, so I have to restore the utfebcdic.h (and I probably broke EBCDIC support on a few more place). Is there some way to fake EBCDIC? Or some other way to test it? But some other things are probably gonna change on EBCDIC plaforms, like C<ord('A') == 65> ie C<ord> returns the unicode codepoint, and also \x{41} would be an 'A'. Does that sound oke? If we make \x{?..}? really insert codepoints, and not sometimes bytes, we need an escape sequence for bytes. In my patch I used \x.. to do that and only \x{..} to insert a codepoint, but I am not very happy about that, maybe \x[..], other suggestions? Gerard Goossen