On 7 March 2016 at 19:16, Ed Avis <perlbug-followup@perl.org> wrote: > # New Ticket Created by "Ed Avis" > # Please include the string: [perl #127670] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=127670 > > > > > This is a bug report for perl from eda@waniasset.com, > generated with the help of perlbug 1.40 running under perl 5.22.1. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > When doing a search-and-replace you may wrap the regular expression in > \b anchors to stop it matching in the middle of a word. s/red/green/g > will change credit to cgreenit but s/\bred\b/green/g does not have > this bug. > > However, you may not know ahead of time whether your source regexp is > itself a word. If you unconditionally wrap it in \b anchors then that > in turn will break if the start or end is not a word character. > > /\b x[(][)] \b/x # will fail to match 'x()-1' or 'x()' > > What you need to do instead is something like > > say 'please enter source and replacement strings:'; > chomp (my $source = <>); > chomp (my $replacement = <>); > while (<>) { > s/(?:\\b|(?!\\w))\Q$source\E(?:\\b|(?<!\\w))/$replacement/g > && print "replaced: $_"; > } > > These (?:\\b|(?!\\w)) and (?:\\b|(?<!\\w)) incantations are useful > enough that they deserve their own anchor. I don't know about that. It is not clear to me that that *is* actually so useful or commonplace, or a complete solution to the underlying problem that it is worthy for taking a escape. On the other hand a better solution for this would be useful. > Rather than matching only > at a word boundary, it would match only at a point that is not in the > middle of a word. That could be a word boundary or it could just be > some point in between two non-word characters. In other words the > new anchor matches > > - at start of string > - at end of string > - when either or both of the surrounding characters are \W > > (Subjective experience: this has come up a couple of times, and the > 'solution' of wrapping a regexp in \b anchors is obvious and only > subtly wrong, so I do think this would help avoid a common regular > expression bug, and falls under "easy things should be easy".) > > FWIW, the different definition \b{wb} works means that it does not > suffer from this problem. Normally you can wrap an arbitrary regexp > in \b{wb} anchors and it will match only when not partway through > a word at start or end. So this might argue for steering users > towards \b{wb} instead of \b. That is interesting. I would like to hear more opinions from people on how important they think this is. YvesThread Previous | Thread Next