develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Marvin Humphrey
Date:
February 7, 2007 14:25
Subject:
Re: Future Perl development
Message ID:
C9B20ADA-EB18-4359-99F9-4C864FF91B62@rectangular.com

On Feb 7, 2007, at 10:37 AM, Mark Overmeer wrote:

>> I think you'd end up at worst case memory usage often enough that you
>> might as well default to 32 when reading in from filehandles, etc,
>> but offer the option of compressing individual strings.
>
> Why?  The charset label is like a constant: can be shared between
> strings.  Most programs do no handle many different charsets, so I
> do not not share your fear that all strings will become 32bits (or
> utf8 or utf16 as you wish, when you like to upgrade to that)

Space occupied by the charset labels isn't my concern.  The scenario  
I'm worried about is where somebody has calibrated the memory  
consumption of an string-manipulating application to fit within  
available RAM, or is reasonably close to threshold by happenstance.

Say someone reads in a string that occupies 300MB when encoded as  
UTF-8.  Say it's mostly ASCII, but has a few code points above the  
BMP thrown in -- musical symbols like the sixteenth note (U+1D161),  
or what have you.  Ka-boom, now that string occupies more than a gig.

Such sudden, huge spikes in memory usage need not happen frequently  
to wreak havoc.  They mere possibility that they might happen is  
enough to cause problems.  Any critical app will have to be prepared  
under a worst-case scenario for memory usage.  Programmers being  
human, sometimes that won't happen, so intermittent failure will  
occur in production.

Sure, we can blame the applications programmer for failing to take  
pains, but when you can guarantee that some percentage of your users  
aren't going to do that, that's bad interface design.

Defaulting to 32-bit storage forces the programmer to deal with worst- 
case scenarios right away.  If we give them the tools to compensate  
-- such as the ability to read into a UTF-8-encoded byte string  
rather than into a character string -- then the increased default RAM  
requirements wouldn't impose a hard limit on what you could do.

>> It's fun to think about, though I don't think any use at all of 32-
>> bit string chars would be realistic without a major version increment
>> or a fork.
>
> Is that true?  Well, yes... probable on the XS interface level
> some things need to be changed or extended.

That would certainly be true as well.

What I was getting at, though, was that a sudden, dramatic increase  
in worst-case-scenario RAM requirements shouldn't be considered  
backwards compatible.

> Nice hackathon subject, by the way.

Sounds like fun.  :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About