[perl #76604] Inaccurate repetition examples in re tutorial docs

David Olsson
July 21, 2010 02:09
[perl #76604] Inaccurate repetition examples in re tutorial docs
# New Ticket Created by  David Olsson 
# Please include the string:  [perl #76604]
# in the subject line of all future correspondence about this issue. 
# <URL: >

Message-Id: <5.10.1_4048_1279645323@D-097-DOLSSON>

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.10.1.

Though I am running an older Perl, this bug applies to the
current Perl documentation.

The regular expression tutorial documents -- perlrequick and
perlretut -- provide inaccurate examples of matching repeated

At, reference

These sections provide similar examples of parsing year strings.
In perlrequick:

$year =~ /\d{2,4}/; # make sure year is at least 2 but not more
                    # than 4 digits
$year =~ /\d{4}|\d{2}/; # better match; throw out 3 digit dates

Either one of these expressions will match any string of
two or more digits. In order to match digits as implied,
the expressions need to bind to some non-digit things.
Simplest might be the beginning and end of the string:

$year =~ /^\d{2,4}$/;       # 2, 3, or 4 digits
$year =~ /^\d{4}$|^\d{2}$/  # 2 or 4 digits (4 preferred)

In the second example, /^\d{4}|\d{2}$/ would NOT be accurate,
because the first alternative binds only to the beginning
of the string and the second alternative binds only to the
end of the string.

If the purpose were to extract a year numeral from anywhere
in the string, the expression might bind to word boundaries,
or, perhaps best, to either the string edges or non-digits:

$year =~ /(?:^|\D)(\d{4}|\d{2})(?:$|\D)/

This expression also returns the extracted year instead of
1 for a match.  But we're probably past what we want to put
in a tutorial.  I would just like to see the examples made
accurate, as above.

Thank you!
