develooper Front page | perl.perl5.porters | Postings from September 2014

[perl #122853] Guarantee 0-9, A-Z, a-z character classes

Thread Next
From:
Ed Avis
Date:
September 26, 2014 08:49
Subject:
[perl #122853] Guarantee 0-9, A-Z, a-z character classes
Message ID:
rt-4.0.18-3300-1411721371-701.122853-75-0@perl.org
# New Ticket Created by  "Ed Avis" 
# Please include the string:  [perl #122853]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=122853 >


For a long time Perl programmers have used [A-Z] in regexps to match
an uppercase letter and [a-z] for lowercase.  For almost as long,
there have been admonishments in various places that these constructs
won't work on non-ASCII systems where the letters of the alphabet
don't have consecutive character codes, and that something like \w or
[[:upper:]] should be used instead.

However, \w changes behaviour depending on locale and the /a and /aa
modifiers; [[:upper:]] and friends seem okay, but not everyone will
be familiar with the POSIX spec or know where to look them up.
A-Z and a-z seem like a clearer way to say what you mean.

There is also the question of whether to use 0-9 instead of \d.
Again, \d changes based on locale or ASCII flags, while 0-9 always
matches exactly the ten digits and nothing else.  However there's a
nagging feeling that some system out there might (in principle) not
use consecutive character codes for the digits, so it might not work.

This bug report is to request that Perl guarantee in its documentation
the following equivalences:

   [A-Z]    [ABCDEFGHIJKLMNOPQRSTUVWXYZ]
   [a-z]    [abcdefghijklmnopqrstuvwxyz]
   [0-9]    [0123456789]

If the EBCDIC port is still active, then some programming work might
be needed to make sure these ranges do as documented on EBCDIC.
That will have the nice side-effect of making plenty of existing
code that does use A-Z work correctly on EBCDIC systems.

If there are currently no Perl ports to non-ASCII systems (and there
aren't likely to be any new ones in the future), then no code or
behaviour change is needed, just a note in the documentation.

Note that currently perlre does talk about

    "\w" means the 63 characters "[A-Za-z0-9_]"

So it is implicit that this character class means the alphanumeric
characters, unless the documentation is really saying that \w does
something funny on EBCDIC systems.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=wishlist
---
Site configuration information for perl 5.18.2:

Configured by Red Hat, Inc. at Tue Jan  7 14:45:19 UTC 2014.

Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.11.9-200.fc19.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-12.phx2.fedoraproject.org 3.11.9-200.fc19.x86_64 #1 smp wed nov 20 21:22:24 utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wl,-z,relro  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.18.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui
 _ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.8.2 20131212 (Red Hat 4.8.2-7)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.18'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch9: Fix find2perl to translate ? glob properly (RT#113054)
    Fedora Patch10: Update h2ph(1) documentation (RT#117647)
    Fedora Patch11: Update pod2html(1) documentation (RT#117623)
    Fedora Patch12: Disable ornaments on perl5db AutoTrace tests (RT#118817)
    Fedora Patch14: Do not use system Term::ReadLine::Gnu in tests (RT#118821)
    Fedora Patch15: Define SONAME for libperl.so
    Fedora Patch16: Install libperl.so to -Dshrpdir value
    Fedora Patch18: Fix crash with \\&$glob_copy (RT#119051)
    Fedora Patch19: Fix coreamp.t rand test (RT#118237)
    Fedora Patch20: Reap child in case where exception has been thrown (RT#114722)
    Fedora Patch21: Fix using regular expressions containing multiple code blocks (RT#117917)
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux

---
@INC for perl 5.18.2:
    /home/eda/lib/perl5/
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .

---
Environment for perl 5.18.2:
    HOME=/home/eda
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERL5LIB=/home/eda/lib/perl5/
    PERL_BADLANG (unset)
    SHELL=/bin/bash

-- 
Ed Avis <eda@waniasset.com>


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About