On Fri, Mar 30, 2007 at 01:31:22PM +0100, Nicholas Clark <nick@ccl4.org> wrote: > > > > So fix it. It is easy to do, and I documented it years ago (during 5.6). > > "this one" that I was confident is a bug is the change of meaning on SvPV() > And in turn what I'm not confident about is the fix. Sorry. I can understand that it might be difficult as perl itself likely relies on the current meaning of SvPV. However, some of the obvious fixes would be to change ExtUtils/typemap so that stuff such as "const char *" does no longer boil down to random bytes. Example: SV *compress (const char *data); the right thing here is to use SvPVbyte, at leats in the majority of cases. The reason is that existing users either have to clal downgrade explicitly themselves or suffer from random problems. etc. > > Besides, without any doubt, the code that relies on psuedo-random > > behaviour is certainkly in the minority. The amount of code in the wild > > that relies on "C" having 5.5 semantics is much larger. I doubt _anybody_ > > except me (or at leats not very many people) understands that he has to > > downgrade scalars before passing them into unpack to decode structures. > > I don't know enough about "C" in pack offhand to know what the right thing to > do is. The right thing to do is the follow the documentation and existing code. Could you tell me why almost every other 5.6 bug was fixed in 5.8, but gratitious breakage of large parts of CPAN are accepted with this change? Whats the rationale behind keeping this 5.6 bug, while fixing the rest? For example, take a network protocol that sends packets prefixed with a 2-byte length header, a type, and data. There is currently no unpack format available to do this, as: unpack "Cn", $data Gives different results depending in the history of the string in $data. If there were a pack type that gave me 5.005 behaviour of returning a single character, I could use it: unpack "Wn", $data; but there simply isn't. Besides, all code does use "C", so the right thing is to move the new pack type to a different modifier. (In my personal opinion, of course, pack should not expose internal encoding at all. Use Devel::Peek or so, or one of the functions in the utf8:: module. The first one who shows me code that would need the peculiar nondeterministic behaviour of unpack "C" gets a prize). > I don't like anything Perl space that lets the abstraction leak, and "C" is > one of them. So why not fix it? Nobody made such a fuss when they fixed the remaining bugs from 5.6. For example, PApp, one of my older modules using unicode, is full of code such as this: Convert::Scalar::utf8_on($_); # DEVEL7952 bug workaround #d# #FIXME# For various values of DEVEL and workaround. Some of that code broke in 5.8 because 5.8 did the right thing (not 5.8.0, mind you, as this fixing went on during 5.8.x). *Nobody* argued my case of "it breaks existing code", not even me, because its clearly a bugfix that lets perl code just work, both old code and new code (which is the beauty of the perl unicode model). > The third thing that you didn't mention which I consider distinct from the two > behaviours you did is that the encoding effects how regexps match, and > lc/uc/lcfirst/ucfirst. The difference is that I haven't seen code break so badly because of that. I see lots of code break because of the incompatible change in the meaning of "C", though. (In fact, I haven't even seen a difference, apart from when use locale is active, which is a rare case). The other difference to that case is that those bugs are getting fixed, while in the case of "C", people just ignore the problem, which increases over time, saying they don't know why to fix this bug. And as I said, there is no pack-type that gives me the old meaning of "C" that every structure-decoding program relies on. Thats gratitious undocumented breakage. (It really is undocumented because all of the perl documentation tells me that the internal encoding doesn't surface, and the small hint in the pack description for "C" seems to reinforce this as it tells me it works "even in the presence of Unicode"!). In any case, please could you answer to me why you accept obvious breakage of old code in this case? I really wanna know. The only argument in favour I have heard os far is that the camelbook documents it in some obscure way. But that cannot be a reason to keep a bug. If the camelbook describes buggy behaviour, it needs a fix. It is insane to force every existing perl program that uses that feature to be changed in a way that contradicts the rest of the documentation, is unintuitive and generaly useless (again, show me a useful application for unpack "C" with 5.8 semantics). -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPEThread Previous | Thread Next