develooper Front page | perl.perl5.porters | Postings from August 2013

Re: Compiled-in POSIX character class inversion lists are now fullyconst in blead

Thread Previous | Thread Next
Karl Williamson
August 5, 2013 19:08
Re: Compiled-in POSIX character class inversion lists are now fullyconst in blead
Message ID:
On 07/03/2013 10:04 PM, Karl Williamson wrote:
> The header file charclass_invlists.h contains the definitions for some
> of the POSIX character classes, such as [:xdigit:].  These are now
> declared as
> static const UV foo[]
> Not being fully const created problems for -DPERL_GLOBAL_STRUCT_PRIVATE,
> and it meant that these were not in the read-only text segment portion
> of the program.  Now multiple instances of Perl running the same
> executable can share these.  It appears to me from code reading that
> these also now aren't copied when the scalars containing them are dup'd,
> as SvLEN is set to 0 in those scalars.

I had to revert this set of patches.  Other messages on the original 
thread indicated very strange behaviors where printing an expression 
from the debugger did not match printing it from the program.  I did not 
investigate this, other than to confirm the behavior.

But the bug turned out to be an off-by-1 error.  I fixed that and redid 
the patches, which got into 5.19.2, without problems so far.

No one responded to my original question posed below, so I'm re-asking it:
> To save memory, only the POSIX classes with smaller representations have
> been compiled-in.  The larger classes have only their Latin1 range
> values compiled.  If the program needs to access something outside that
> range, the appropriate tables must be loaded from disk.
> I'm thinking, as David Mitchell suggested some time ago, that this
> change should effect our calculation of which ones should get compiled.
>   I think \d should definitely be compiled, and probably \w (the sizes
> being 84 and 1130 UVs respectively).
> The other candidates with the number of UVs they occupy in Unicode 6.2 are:
> alnum 1132
> alpha 1080
> graph 1088
> lower 1236
> print 1082
> punct 272
> upper 1220
> cased 238  (used internally, a combination of lower + upper)
> These numbers will grow in future Unicode releases.

Does anyone have any ideas about this?

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About