Hello. To emulate EBCDIC platform, I ran a codelet like below with #define EBCDIC. for (uv = 0; uv <= PERL_UNICODE_MAX; uv++) { memzero(buff,UTF8_MAXBYTES); d = buff; d = uvuni_to_utf8(d, uv); if (!is_utf8_string(buff, d - buff)) ++fail; } Even though IS_UTF8_CHAR is removed, it still fails in 64 cases within uv = 0x0000..0x10ffff [uv failed are 00A0..00BF (start = 0x80) and 0260..027F (start = 0xa0) ] Results (1) perl-current: 0x3ff60 failures in uv = 0x0000..0x10ffff. 0x443ff60 failures in uv = 0x0000..0x7fffffff. (2) change utf8.h but not change utf8.c: 0x40 failures in uv = 0x0000..0x10ffff. 0x4400040 failures in uv = 0x0000..0x7fffffff. (3) change both utf8.h and is_utf8_char_slow (see a patch below): no failure in uv = 0x0000..0x10ffff. no failure in uv = 0x0000..0x7fffffff. I found that is_utf8_char_slow() in utf.c tries to get a UTF-EBCDIC value without the conversion of the start octet from UTF-EBCDIC to I8-sequence. The above failures are due to fail to apply NATIVE_TO_UTF() to the start octet. Regards, sadahiro tomoyuki ! utf8.c utf8.h diff -ur perl~/utf8.c perl/utf8.c --- perl~/utf8.c Tue Jul 19 00:53:16 2005 +++ perl/utf8.c Sun Oct 02 16:18:56 2005 @@ -209,6 +209,9 @@ slen = len - 1; s++; +#ifdef EBCDIC + u = NATIVE_TO_UTF(u); +#endif u &= UTF_START_MASK(len); uv = u; ouv = uv; diff -ur perl~/utf8.h perl/utf8.h --- perl~/utf8.h Wed Jun 08 00:04:28 2005 +++ perl/utf8.h Sun Oct 02 15:47:26 2005 @@ -258,6 +258,9 @@ #endif #define SHARP_S_SKIP 2 +#ifdef EBCDIC +/* IS_UTF8_CHAR() is not ported to EBCDIC */ +#else #define IS_UTF8_CHAR_1(p) \ ((p)[0] <= 0x7F) #define IS_UTF8_CHAR_2(p) \ @@ -329,3 +332,4 @@ #define IS_UTF8_CHAR_FAST(n) ((n) <= 4) +#endif /* IS_UTF8_CHAR() for UTF-8 */Thread Previous | Thread Next