develooper Front page | perl.perl5.porters | Postings from March 2010

Re: Another optimization question: bsearch()

Thread Previous | Thread Next
Nicholas Clark
March 12, 2010 08:26
Re: Another optimization question: bsearch()
Message ID:
On Thu, Mar 11, 2010 at 01:15:49PM -0700, karl williamson wrote:
> Jan Dubois wrote:
> >On Mon, 08 Mar 2010, Nicholas Clark wrote:
> >>However, even 7-14Mb doesn't scare me. On any demand paged OS, if it's not
> >>used, it's not paged in. I doubt that we have any platform where we're 
> >>that
> >>tight on virtual memory space for executables.
> >
> >It is also a little bit of an issue for people who distribute stand-alone
> >applications written in Perl and bundled into a single executable with 
> >tools
> >like PAR, PerlApp or Perl2Exe.

That was one case I hadn't considered.

> >Right now you can fine tune the size of your bundled Unicode support by
> >for example removing the 4 big tables for the major Asian scripts and
> >still have Unicode support for all the European languages.  Having just one
> >huge table would likely make this impossible.
> >
> >Having such a big footprint is also a huge drag on application startup
> >time if they have to extract the table to disk every time the application 
> >is
> >run (yes, you can leave the files behind in a cache area, but some people
> >prefer not to leave stuff behind when the application exits).
> >
> As a compromise measure, couldn't these tables be dynamically loaded on 
> demand?  I would guess that Perl already has a mechanism to do this, but 
> if not, a malloc and copy from disk would work.

For that a read only mmap() would be better. It allows the memory to be
shared between processes (when the files are not extracted temporaries)

> >I would also be curious how good those 7-14MB could be compressed, as that
> >too would make a big difference for the actual filesize of packaged apps.

There are some lovely suggestions for tight data representations in the
link that Ævar mailed to the list last night:

I particularly liked the concept of doing Unicode classes as DFAs on the
octets of the UTF-8 representations of the code points. That might be a win
for us.

(Independent of the more radical ideas of adaptively choosing a DFA or an NFA
based on the feature complexity of the pattern compiled)

(It also references this insane idea:
A technique with a stated caveat of "This would drive valgrind nuts.")

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About