* Marvin Humphrey (marvin@rectangular.com) [070207 15:21]: > On Feb 7, 2007, at 4:09 AM, Mark Overmeer wrote: >> And for 7/8bit you would like to keep track of the character-set used >> in the string, such that you can automatically convert to unicode when >> need. And filenames defined inside your program to the charset >> used on a particular file-system. And... implicit conversions where we >> require explicit conversions now. > > Wow, internalizing the Encode module. What a beautiful thought. > >> so each string needs an associated charset label > > I think you'd end up at worst case memory usage often enough that you > might as well default to 32 when reading in from filehandles, etc, > but offer the option of compressing individual strings. Why? The charset label is like a constant: can be shared between strings. Most programs do no handle many different charsets, so I do not not share your fear that all strings will become 32bits (or utf8 or utf16 as you wish, when you like to upgrade to that) > It's fun to think about, though I don't think any use at all of 32- > bit string chars would be realistic without a major version increment > or a fork. Is that true? Well, yes... probable on the XS interface level some things need to be changed or extended. Nice hackathon subject, by the way. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net