I was trying to benchmark the effects of moving to a binary search instead of a swash hash, as was discussed in the recent thread entitled "Seeking guidance on Benchmarking and potentially removing swash caching" I tried an implementation in which I just added a key to the swash hash "USE_INVLIST". Each time a character was looked up in the hash, it would first check if there is an element in the hash with that key, and use the swash's inversion list (stored under the key "INVLIST") instead of its normal swash behavior. I was surprised to find that this was twice as slow as without. Eventually, I used gprof to pin down the reason why, and got the following results. First, the old way: % cumulative self self total time seconds seconds calls s/call s/call name 32.01 18.39 18.39 200863004 0.00 0.00 Perl_hv_common 21.72 30.88 12.48 200854008 0.00 0.00 Perl_swash_fetch Then, the new: % cumulative self self total time seconds seconds calls s/call s/call name 60.69 71.47 71.47 400679084 0.00 0.00 Perl_hv_common 6.07 78.62 7.15 200854008 0.00 0.00 Perl_swash_fetch There were about twice as many calls to hv_common, but the time spent in it nearly quadrupled. I eventually tried shortening the keys listed above to one byte each, and I got this: % cumulative self self total time seconds seconds calls s/call s/call name 47.40 41.78 41.78 400682134 0.00 0.00 Perl_hv_common 8.70 49.45 7.67 200854008 0.00 0.00 Perl_swash_fetch It appears to me that the length of the keys had an outsize impact on performance. (I eventually went to an implementation which avoids using extra hash keys, and the performance of the binary search over the hash improved by about 37% when the list was 64 elements or less long (using binary search for shorter lists, and the hash for longer). The benchmark was on a Korean text using the \X regex match (which is quite complicated for Korean) )Thread Next