Ok, last mail, because this is a different topic :) On Sat, Mar 31, 2007 at 01:08:21AM +0200, Juerd Waalboer <juerd@convolution.nl> wrote: > Marc Lehmann skribis 2007-03-31 0:25 (+0200): > > If you send a compressed string over the network using JSON and decompress > > it, you need to know that. > > Does JSON compress arbitrary data? no. > If so, then the user must do the decoding and encoding, No, compression is something completely orthogonal from encoding. Neither forces me to do the other. > because arbitrary data only exists in byte form Thats eems completely wrong to me. > Once you dictate any specific encoding, it's no longer arbitrary. JSON dictates unicode for the JSON text, and strongly hints at the use of UTF-8 for interchange purposes. > On the other hand, if JSON does text data only, No, it does support binary data just as well. It is used a lot, too. It works just like perl without the bugs: You have a string type that can store bytes. It is up to the user to interpret them as she wants. > it can just use any UTF encoding on both sides, and document it like > that. It is a bit complicated, but you can safely assume that 99% of all JSON is UTF-8 encoded. In fact, you can recode all JSON documents into ASCII, too. JSON::XS offers that, and JSON::XS by default encodes to/decodes from UTF-8, but allows the user to decode/encode himself. JSON text is composed of unicode characters, and in Perl some JSON modules store them as a simple Perl string. All that is not well-supported by most JSON modules, though, for example JSON::XS is the only module for perl that correctly decodes escaped surrogate pairs. > Unless both sides are exactly the same platform (e.g. both Perl), you > need to establish a protocol for sending data anyway. And that protocol > should also describe encoding. If sender and receiver don't agree, you > have a problem. No, it doesn't have anything to do with the platform. Even when both sides use Perl I need to decide on a common encoding. Thats strictly outside the JSON definition, though. > > I am really frustrated at that. It makes perl as a whole rather > > questionable for unicode use, as you constantly have to think about > > the internals. And yes, that simply shouldn't be the case. > > I maintain that it isn't the case, for almost any programming job, > unless you're indeed doing things with internals. Well, the JSON::XS module certainly does things with the internals, it has to flag some strings as UTF-X, and in fact flags all strings that way unless you enable the shrink option, which is documented to try to shrink the memory used in various ways (one way is to try to downgrade the scalar). Certainly, the user who reported the bug also didn't look at the internals. Compress::Zlib called unpack "CCCV" or somesuch, though, which unfortunately treats V very different from C, by looking at the internals with "C", and not doing that and treating the string as an octte string with "V". The user suggested that JSON::XS corrupts binary data because it happens to be returned upgraded unless you set the shrink option. However, Perl does not expose the internals elsewhere, the upgraded version is semantically equivalent to the downgraded one unless you use an XS module using SvPV directly or indirectly (considered a bug in Perl when I understood nick correctly), or when using unpack "C", as that has a different meaning in perl 5.6 than in perl 5.005, and has confusing documentation. The right thing for Compress::Zlib is not to use unpack "CCCV" but unpack "UUUV", which seems completely weird to me, as no unicode was ever involved *on the perl level*. -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPEThread Previous | Thread Next