On 07/03/2013 10:04 PM, Karl Williamson wrote: > The header file charclass_invlists.h contains the definitions for some > of the POSIX character classes, such as [:xdigit:]. These are now > declared as > > static const UV foo[] > > Not being fully const created problems for -DPERL_GLOBAL_STRUCT_PRIVATE, > and it meant that these were not in the read-only text segment portion > of the program. Now multiple instances of Perl running the same > executable can share these. It appears to me from code reading that > these also now aren't copied when the scalars containing them are dup'd, > as SvLEN is set to 0 in those scalars. I had to revert this set of patches. Other messages on the original thread indicated very strange behaviors where printing an expression from the debugger did not match printing it from the program. I did not investigate this, other than to confirm the behavior. But the bug turned out to be an off-by-1 error. I fixed that and redid the patches, which got into 5.19.2, without problems so far. No one responded to my original question posed below, so I'm re-asking it: > > To save memory, only the POSIX classes with smaller representations have > been compiled-in. The larger classes have only their Latin1 range > values compiled. If the program needs to access something outside that > range, the appropriate tables must be loaded from disk. > > I'm thinking, as David Mitchell suggested some time ago, that this > change should effect our calculation of which ones should get compiled. > I think \d should definitely be compiled, and probably \w (the sizes > being 84 and 1130 UVs respectively). > > The other candidates with the number of UVs they occupy in Unicode 6.2 are: > > alnum 1132 > alpha 1080 > graph 1088 > lower 1236 > print 1082 > punct 272 > upper 1220 > cased 238 (used internally, a combination of lower + upper) > > These numbers will grow in future Unicode releases. > Does anyone have any ideas about this?Thread Previous | Thread Next