develooper Front page | perl.perl5.porters | Postings from March 2010

Re: Another optimization question: bsearch()

Thread Previous | Thread Next
From:
karl williamson
Date:
March 14, 2010 12:28
Subject:
Re: Another optimization question: bsearch()
Message ID:
4B9D3897.9050607@khwilliamson.com
Nicholas Clark wrote:
> On Thu, Mar 11, 2010 at 01:15:49PM -0700, karl williamson wrote:
>> Jan Dubois wrote:
>>> On Mon, 08 Mar 2010, Nicholas Clark wrote:
>>>> However, even 7-14Mb doesn't scare me. On any demand paged OS, if it's not
>>>> used, it's not paged in. I doubt that we have any platform where we're 
>>>> that
>>>> tight on virtual memory space for executables.
>>> It is also a little bit of an issue for people who distribute stand-alone
>>> applications written in Perl and bundled into a single executable with 
>>> tools
>>> like PAR, PerlApp or Perl2Exe.
> 
> That was one case I hadn't considered.
> 
>>> Right now you can fine tune the size of your bundled Unicode support by
>>> for example removing the 4 big tables for the major Asian scripts and
>>> still have Unicode support for all the European languages.  Having just one
>>> huge table would likely make this impossible.
>>>
>>> Having such a big footprint is also a huge drag on application startup
>>> time if they have to extract the table to disk every time the application 
>>> is
>>> run (yes, you can leave the files behind in a cache area, but some people
>>> prefer not to leave stuff behind when the application exits).
>>>
>> As a compromise measure, couldn't these tables be dynamically loaded on 
>> demand?  I would guess that Perl already has a mechanism to do this, but 
>> if not, a malloc and copy from disk would work.
> 
> For that a read only mmap() would be better. It allows the memory to be
> shared between processes (when the files are not extracted temporaries)
> 
>>> I would also be curious how good those 7-14MB could be compressed, as that
>>> too would make a big difference for the actual filesize of packaged apps.
> 
> There are some lovely suggestions for tight data representations in the
> link that Ævar mailed to the list last night:
> 
>     http://swtch.com/~rsc/regexp/regexp3.html
> 
> I particularly liked the concept of doing Unicode classes as DFAs on the
> octets of the UTF-8 representations of the code points. That might be a win
> for us.

But notice that he only implements two Unicode properties, gc and sc. 
This is likely much faster than our mechanism, but I think it takes 
significantly more space.

And it appears that the case insensitive matching isn't the full set, 
which Perl purports to use, but it is so buggy as to be close to useless.
> 
> (Independent of the more radical ideas of adaptively choosing a DFA or an NFA
> based on the feature complexity of the pattern compiled)
> 
> (It also references this insane idea:
> http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html
> A technique with a stated caveat of "This would drive valgrind nuts.")
> 
> 
> Nicholas Clark
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About