On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]: >> However it seems wrong to test for #chars != #bytes, because binary >> data _should_ be passed as byte strings, that is, with Perl's internal >> utf8 flag off. > > Disagree. And I agree with you on this, although for different reasons than you. My previous reply was only about this: > The UTF8 flag is completely irrelevant to a string’s semantics. Wherever > it’s treated as meaningful, that is a bug that should be fixed. The UTF8 flag means I cant twiddle bits in the string. If the string is utf8 on I cannot, at least not without potentially creating broken utf8, And if a binary string is "upgraded" you get a mess. (Can you tell I am bitter?) > So it > seems to me at first sight that the string should just reach the fast > exit check untouched and be left for the remaining code to deal with. > > But on closer read I get a vague impression that the intent of the code > in the whole function is based on confused notions about encodings. And > that it therefore possibly should be done over entirely. I am not yet > sure exactly what it is trying to achieve, though. I am pretty sure the idea is that we use \x{..} escapes for codepoints 80 and up when the string is utf8. Octal is used otherwise. This is actually significant data, it means you can round trip a raw string of bytes and have them come back with utf8 flag off, or send the same "chars" as utf8. (Although whether this actually works via *eval* is another question.) > > As an irrelevant aside, > >> s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length; > > … it’s a mystery to me why the replacement expression was spelled > > '\x{'.sprintf('%x',...).'}' > > instead of simply > > sprintf('\x{%x}',...) > > and similarly for several other substitutions within the function. Yes indeed i was thinking the same thing. Sorry about the tone of my previous mail, I shouldn't have replied without saying more. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next