On Tue Oct 01 02:05:19 2013, hv wrote: > > I'm still confused about the intent of this twice-repeated mantra: > if (! (ANYOF_FLAGS(data.start_class) & ANYOF_EMPTY_STRING) > && ! ssc_is_anything(data.start_class)) > > Given that when the first clause is true, ssc_is_anything() immediately > returns FALSE, isn't this in both cases the same as: > if (! (ANYOF_FLAGS(data.start_class) & ANYOF_EMPTY_STRING)) > ? > > I think there'd be value in adding some brief comments about the intent > around these checks. This has now been changed by commit b35552de5cea8eb47ccb046284ecb9a099430255 Author: Karl Williamson <khw@cpan.org> Date: Mon Sep 22 13:59:39 2014 -0600 Tighten uses of regex synthetic start class A synthetic start class (SSC) is generated by the regular expression pattern compiler to give a consolidation of all the possible things that can match at the beginning of where a pattern can possibly match. For example qr/a?bfoo/; requires the match to begin with either an 'a' or a 'b'. There are no other possibilities. We can set things up to quickly scan for either of these in the target string, and only when one of these is found do we need to look for 'foo'. There is an overhead associated with using SSCs. If the number of possibilities that the SSC excludes is relatively small, it can be counter-productive to use them. This patch creates a crude sieve to decide whether to use an SSC or not. If the SSC doesn't exclude at least half the "likely" possiblities, it is discarded. This patch is a starting point, and can be refined if necessary as we gain experience. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/212644 In many patterns, no SSC is generated; and with the advent of tries, SSC's have become less important, so whatever we do is not terribly critical. ================================= The code now reads if ((!(r->anchored_substr || r->anchored_utf8) || r->anchored_offset) && stclass_flag && ! (ANYOF_FLAGS(data.start_class) & SSC_MATCHES_EMPTY_STRING) && is_ssc_worth_it(pRExC_state, data.start_class)) { ===================== > > I note also that the new test ends up applying a rather pessimal > optimization: > % ./perl -Ilib -Mre=debug -we '"" =~ /^A*\z/ or die;' > Compiling REx "^A*\z" > Final program: > 1: BOL (2) > 2: STAR (5) > 3: EXACT <A> (0) > 5: EOS (6) > 6: END (0) > floating ""$ at 0..2147483647 (checking floating) anchored(BOL) minlen 0 > Matching REx "^A*\z" against "" > Found floating substr ""$ at offset 0... > Guessed: match at offset 0 > [...] > > Hugo Later in the thread we concluded that this optimisation was unchanged from before, and needs a different ticket. I don't know if that ever got filed. -- Karl Williamson --- via perlbug: queue: perl5 status: resolved https://rt.perl.org/Ticket/Display.html?id=120041