develooper Front page | perl.perl5.porters | Postings from July 2010

[perl #76604] Inaccurate repetition examples in re tutorial docs

Thread Next
From:
David Olsson
Date:
July 21, 2010 02:09
Subject:
[perl #76604] Inaccurate repetition examples in re tutorial docs
Message ID:
rt-3.6.HEAD-11314-1279648211-606.76604-75-0@perl.org
# New Ticket Created by  David Olsson 
# Please include the string:  [perl #76604]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=76604 >


Message-Id: <5.10.1_4048_1279645323@D-097-DOLSSON>

This is a bug report for perl from davidolsson@yahoo.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.


-----------------------------------------------------------------
Though I am running an older Perl, this bug applies to the
current Perl documentation.

The regular expression tutorial documents -- perlrequick and
perlretut -- provide inaccurate examples of matching repeated
expressions.

At perldoc.perl.org, reference

http://perldoc.perl.org/perlrequick.html#Matching-repetitions
and
http://perldoc.perl.org/perlretut.html#Matching-repetitions

These sections provide similar examples of parsing year strings.
In perlrequick:

$year =~ /\d{2,4}/; # make sure year is at least 2 but not more
                    # than 4 digits
$year =~ /\d{4}|\d{2}/; # better match; throw out 3 digit dates

Either one of these expressions will match any string of
two or more digits. In order to match digits as implied,
the expressions need to bind to some non-digit things.
Simplest might be the beginning and end of the string:

$year =~ /^\d{2,4}$/;       # 2, 3, or 4 digits
$year =~ /^\d{4}$|^\d{2}$/  # 2 or 4 digits (4 preferred)

In the second example, /^\d{4}|\d{2}$/ would NOT be accurate,
because the first alternative binds only to the beginning
of the string and the second alternative binds only to the
end of the string.

If the purpose were to extract a year numeral from anywhere
in the string, the expression might bind to word boundaries,
or, perhaps best, to either the string edges or non-digits:

$year =~ /(?:^|\D)(\d{4}|\d{2})(?:$|\D)/

This expression also returns the extracted year instead of
1 for a match.  But we're probably past what we want to put
in a tutorial.  I would just like to see the examples made
accurate, as above.

Thank you!
-----------------------------------------------------------------
---
Flags:
    category=docs
    severity=low
---
Site configuration information for perl 5.10.1:

Configured by rurban at Fri Dec 18 14:51:24 GMT 2009.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:

  Platform:
    osname=cygwin, osvers=1.7.0(0.21853),
archname=i686-cygwin-thread-multi-64int
    uname='cygwin_nt-5.1 reini 1.7.0(0.21853) 2009-12-04 17:08 i686 cygwin '
    config_args='-de -Dlibperl=cygperl5_10.dll -Dmksymlinks
-Dusethreads -Doptimize=-O3'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O3',
    cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.3.4 20090804 (release) 1', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++', ldflags =' -Wl,--enable-auto-import
-Wl,--export-all-symbols -Wl,--stack,8388608
-Wl,--enable-auto-image-base -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib
    libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
    perllibs=-ldl -lcrypt
    libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=cygperl5_10.dll
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags=' --shared  -Wl,--enable-auto-import
-Wl,--export-all-symbols -Wl,--stack,8388608
-Wl,--enable-auto-image-base -L/usr/local/lib -fstack-protector'

Locally applied patches:
    CYG11 no-bs
    CYG12 no archlib in otherlibdirs
    CYG14 Dynaloader
    CYG15 static-Win32CORE
    CYG17 utf8-paths
    CYG21 LibList-Kid.patch
    CYG22 cygwin-1.7 hints
    CYG23 544-stat
    CYG24 build man pages
    CYG26 Cwd for svk
    Bug#55162 File::Spec::case_tolerant performance
    disable ExtUtils::MakeMaker::Coverage in Sys-Syslog

---
@INC for perl 5.10.1:
    /usr/lib/perl5/5.10/i686-cygwin
    /usr/lib/perl5/5.10
    /usr/lib/perl5/site_perl/5.10/i686-cygwin
    /usr/lib/perl5/site_perl/5.10
    /usr/lib/perl5/vendor_perl/5.10/i686-cygwin
    /usr/lib/perl5/vendor_perl/5.10
    /usr/lib/perl5/vendor_perl/5.10
    /usr/lib/perl5/site_perl/5.8
    /usr/lib/perl5/vendor_perl/5.8
    .

---
Environment for perl 5.10.1:
    HOME=/home/dolsson
    LANG=C.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/usr/bin:/bin:/cygdrive/c/Program Files/Common
Files/Microsoft Shared/Windows
Live:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/Program
Files/Intel/DMIX:/cygdrive/c/Program
Files/QuickTime/QTSystem/:/cygdrive/c/WINDOWS/system32/WindowsPowerShell/v1.0:/cygdrive/c/Program
Files/SlikSvn/bin/:/cygdrive/c/Python25:/cygdrive/c/Program
Files/Common Files/Microsoft Shared/Windows
Live:/cygdrive/c/Python26:/cygdrive/c/Program
Files/Vim/vim72:/cygdrive/c/Program Files/AutoIt3:/cygdrive/c/Program
Files/Google/google_appengine/:/cygdrive/c/Documents and
Settings/dolsson/My
Documents/install/xpdf-3.02pl4-win32:/usr/lib/lapack
    PERL_BADLANG (unset)
    SHELL (unset)


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About