Dr.Ruud skribis 2007-04-04 21:30 (+0200): > > A A byte-wide string with arbitrary binary data, will > > be space padded. > Is that ASCII-space (0x20) or can it be locale or EBCDIC-space (0x40) > too? memset(cur, datumtype == 'A' ? ' ' : '\0', len); I don't know how in C ' ' is interpreted on an EBCDIC platform, and if any translation from ASCII to EBCDIC happens before compiling, etcetera. So I can't answer this question. But while I was reading the source, I found this interesting part in a 5.9.5: /* 'A' strips both nulls and spaces */ const char *ptr; if (utf8 && (symptr->flags & FLAG_WAS_UTF8)) { ... !is_utf8_space((U8 *) ptr)) break; ... } else { ... if (*ptr != 0 && !isSPACE(*ptr)) break; ... } While 5.8.8, only the latter (isSPACE) is used. This means that the bug that the regex engine has, is now copied to pack. Hurrah! :) In short: the UTF8 flag is again used to decide between ASCII and Unicode semantics, while a non-UTF8-flagged text strings are latin1, which is Unicode too. If unpack wants to treat byte data encoded as utf8 like it treats unencoded byte data, upgrading non breaking space must not make any difference. In unicode, U+00A0 is whitespace, but former Perls have not considered \xa0 whitespace in unpack. use v5.9.5; use strict; use warnings; use Test::More tests => 1; my $nbsp1 = "abc\xa0 "; my $nbsp2 = $nbsp1; utf8::upgrade($nbsp2); my $unpacked1 = unpack("A*", $nbsp1); my $unpacked2 = unpack("A*", $nbsp2); is($unpacked1, $unpacked2); I maintain that supporting UTF8 flagged strings with unpack is a waste of effort. But if it is done, then it must be done correctly and compatibly, or the hurting continues and the effort will have been in vain. -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.Thread Previous | Thread Next