On Wed, Aug 24, 2016 at 10:08:17AM -0600, Karl Williamson wrote: > But in examining things, I do have a concern. Dave added a > 'increment_locinput:' label in S_regmatch() in 28b98f76, but many of the > cases bypass that. The code at that label allows the input to go one off > the end. I'm wondering if more of the cases should goto that label instead > of incrementing locinput themselves and doing a 'break'. I have some vague > recollection around when one was supposed to do that, but I would have made > it a comment so it would be plain, and I don't see any such comment. It > might have been a different function(). Its a while ago now; but IIRC this was part of my work to make the regex engine not read past the end of the string (including the not reading the implied \0). In particular, memory mapped stings were problematical: the regex engine could trigger a page fault. I suspect that 'goto increment_locinput' is probably generally the better bet, *but only when* *locinput == nextchr, since increment_locinput assumes that. I would speculate that some of the places that do their own increment do it because: *locinput is not equal to nextchr, or they already have a "if (utf8_target) { X } { else Y }" block of code so it's less efficient to test the condition again. I can't remember why its safe to be 1 byte over, but not 2+, and thus can't say off the top of my head whether other branches are bad for not doing that test. -- That he said that that that that is is is debatable, is debatable.Thread Previous | Thread Next