develooper Front page | perl.perl5.porters | Postings from January 2012

RE: blead now has inversion lists for Unicode properties; ? AnyUnicode performance benchmarks ?

Thread Previous | Thread Next
From:
vadim.konovalov
Date:
January 18, 2012 04:06
Subject:
RE: blead now has inversion lists for Unicode properties; ? AnyUnicode performance benchmarks ?
Message ID:
35BF8D9716175C43BB9D67CA60CC345E3A156837@FRMRSSXCHMBSC2.dc-m.alcatel-lucent.com
> From: Karl Williamson 

> One way the old scheme (and still in the current) coped with 
> the linear 
> search performance is by permanently storing (in a hash) the 
> results of 
> the search, so that it is never performed again.  It also at the same 
> time stored in the same hash value the results of searching 
> for the 63 
> adjacent code points (a bit each), so that a request for any of them 
> would not go through the linear search.  For example, if you 
> searched to 
> see if code point 130 matches a property, it would at the same time 
> figure out whether or not all code points from 128 through 
> 128+63 match 
> the property.  This is not a cache as there is no provision 
> for limiting 
> the size of the hash.  If in the course of your program 
> execution, you 
> match code points 0, 63, 127, ... the hash will keep growing, and you 
> will run out of memory at some point.  This is still the 
> case.  

why memory exhaustion could ever happen here?

Is the given hash - per regular expression?

As far as I understood it - this is a hash of codepoint by something else, 
hence it will never grow bigger than number of all codepoints in worst
case (for even huge strings to match).

So - as I see your explanation - it will stop growing at some point..

Otherwise - can you please just point me directly into Perl source
any usage of this hash, so I will have an idea on how it is constructed?

Vadim.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About