On Feb 7, 2007, at 10:37 AM, Mark Overmeer wrote: >> I think you'd end up at worst case memory usage often enough that you >> might as well default to 32 when reading in from filehandles, etc, >> but offer the option of compressing individual strings. > > Why? The charset label is like a constant: can be shared between > strings. Most programs do no handle many different charsets, so I > do not not share your fear that all strings will become 32bits (or > utf8 or utf16 as you wish, when you like to upgrade to that) Space occupied by the charset labels isn't my concern. The scenario I'm worried about is where somebody has calibrated the memory consumption of an string-manipulating application to fit within available RAM, or is reasonably close to threshold by happenstance. Say someone reads in a string that occupies 300MB when encoded as UTF-8. Say it's mostly ASCII, but has a few code points above the BMP thrown in -- musical symbols like the sixteenth note (U+1D161), or what have you. Ka-boom, now that string occupies more than a gig. Such sudden, huge spikes in memory usage need not happen frequently to wreak havoc. They mere possibility that they might happen is enough to cause problems. Any critical app will have to be prepared under a worst-case scenario for memory usage. Programmers being human, sometimes that won't happen, so intermittent failure will occur in production. Sure, we can blame the applications programmer for failing to take pains, but when you can guarantee that some percentage of your users aren't going to do that, that's bad interface design. Defaulting to 32-bit storage forces the programmer to deal with worst- case scenarios right away. If we give them the tools to compensate -- such as the ability to read into a UTF-8-encoded byte string rather than into a character string -- then the increased default RAM requirements wouldn't impose a hard limit on what you could do. >> It's fun to think about, though I don't think any use at all of 32- >> bit string chars would be realistic without a major version increment >> or a fork. > > Is that true? Well, yes... probable on the XS interface level > some things need to be changed or extended. That would certainly be true as well. What I was getting at, though, was that a sudden, dramatic increase in worst-case-scenario RAM requirements shouldn't be considered backwards compatible. > Nice hackathon subject, by the way. Sounds like fun. :) Marvin Humphrey Rectangular Research http://www.rectangular.com/Thread Previous | Thread Next