On Feb 7, 2007, at 4:09 AM, Mark Overmeer wrote: > And for 7/8bit you would like to keep track of the character-set used > in the string, such that you can automatically convert to unicode when > need. And filenames defined inside your program to the charset > used on > a particular file-system. And... implicit conversions where we > require > explicit conversions now. Wow, internalizing the Encode module. What a beautiful thought. > Hum... so each string needs an associated charset label (which also > determines the number of bytes per character) and each string > operation > needs to be aware that operands may require conversion before use... > Maybe: if both encodings are different, than always convert both to > U32. I think you'd end up at worst case memory usage often enough that you might as well default to 32 when reading in from filehandles, etc, but offer the option of compressing individual strings. > Sounds like a lot of work, but rather straight forward for most of > the way. It's fun to think about, though I don't think any use at all of 32- bit string chars would be realistic without a major version increment or a fork. While mind-bogglingly wasteful memory habits and and implicit conversion are both in the Perl spirit, that magnitude of spike in memory usage would render Perl unsuitable for some percentage of the applications it's currently used for. But the savings in opportunity cost would be *vast*. Having one string type -- and having it be fixed-length to boot -- nukes the Gordian knot. So please, disagree with me. :) Marvin Humphrey Rectangular Research http://www.rectangular.com/