Front page | perl.perl5.porters |
Postings from September 2014
[perl #122853] Guarantee 0-9, A-Z, a-z character classes
Thread Next
From:
Ed Avis
Date:
September 26, 2014 08:49
Subject:
[perl #122853] Guarantee 0-9, A-Z, a-z character classes
Message ID:
rt-4.0.18-3300-1411721371-701.122853-75-0@perl.org
# New Ticket Created by "Ed Avis"
# Please include the string: [perl #122853]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=122853 >
For a long time Perl programmers have used [A-Z] in regexps to match
an uppercase letter and [a-z] for lowercase. For almost as long,
there have been admonishments in various places that these constructs
won't work on non-ASCII systems where the letters of the alphabet
don't have consecutive character codes, and that something like \w or
[[:upper:]] should be used instead.
However, \w changes behaviour depending on locale and the /a and /aa
modifiers; [[:upper:]] and friends seem okay, but not everyone will
be familiar with the POSIX spec or know where to look them up.
A-Z and a-z seem like a clearer way to say what you mean.
There is also the question of whether to use 0-9 instead of \d.
Again, \d changes based on locale or ASCII flags, while 0-9 always
matches exactly the ten digits and nothing else. However there's a
nagging feeling that some system out there might (in principle) not
use consecutive character codes for the digits, so it might not work.
This bug report is to request that Perl guarantee in its documentation
the following equivalences:
[A-Z] [ABCDEFGHIJKLMNOPQRSTUVWXYZ]
[a-z] [abcdefghijklmnopqrstuvwxyz]
[0-9] [0123456789]
If the EBCDIC port is still active, then some programming work might
be needed to make sure these ranges do as documented on EBCDIC.
That will have the nice side-effect of making plenty of existing
code that does use A-Z work correctly on EBCDIC systems.
If there are currently no Perl ports to non-ASCII systems (and there
aren't likely to be any new ones in the future), then no code or
behaviour change is needed, just a note in the documentation.
Note that currently perlre does talk about
"\w" means the 63 characters "[A-Za-z0-9_]"
So it is implicit that this character class means the alphanumeric
characters, unless the documentation is really saying that \w does
something funny on EBCDIC systems.
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=wishlist
---
Site configuration information for perl 5.18.2:
Configured by Red Hat, Inc. at Tue Jan 7 14:45:19 UTC 2014.
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
Platform:
osname=linux, osvers=3.11.9-200.fc19.x86_64, archname=x86_64-linux-thread-multi
uname='linux buildvm-12.phx2.fedoraproject.org 3.11.9-200.fc19.x86_64 #1 smp wed nov 20 21:22:24 utc 2013 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.18.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui
_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.8.2 20131212 (Red Hat 4.8.2-7)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -fstack-protector'
libpth=/usr/local/lib64 /lib64 /usr/lib64
libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.18'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro '
Locally applied patches:
Fedora Patch1: Removes date check, Fedora/RHEL specific
Fedora Patch3: support for libdir64
Fedora Patch4: use libresolv instead of libbind
Fedora Patch5: USE_MM_LD_RUN_PATH
Fedora Patch6: Skip hostname tests, due to builders not being network capable
Fedora Patch7: Dont run one io test due to random builder failures
Fedora Patch9: Fix find2perl to translate ? glob properly (RT#113054)
Fedora Patch10: Update h2ph(1) documentation (RT#117647)
Fedora Patch11: Update pod2html(1) documentation (RT#117623)
Fedora Patch12: Disable ornaments on perl5db AutoTrace tests (RT#118817)
Fedora Patch14: Do not use system Term::ReadLine::Gnu in tests (RT#118821)
Fedora Patch15: Define SONAME for libperl.so
Fedora Patch16: Install libperl.so to -Dshrpdir value
Fedora Patch18: Fix crash with \\&$glob_copy (RT#119051)
Fedora Patch19: Fix coreamp.t rand test (RT#118237)
Fedora Patch20: Reap child in case where exception has been thrown (RT#114722)
Fedora Patch21: Fix using regular expressions containing multiple code blocks (RT#117917)
Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux
---
@INC for perl 5.18.2:
/home/eda/lib/perl5/
/usr/local/lib64/perl5
/usr/local/share/perl5
/usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl
/usr/lib64/perl5
/usr/share/perl5
.
---
Environment for perl 5.18.2:
HOME=/home/eda
LANG=en_GB.UTF-8
LANGUAGE (unset)
LC_COLLATE=C
LC_CTYPE=en_GB.UTF-8
LC_MESSAGES=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
PERL5LIB=/home/eda/lib/perl5/
PERL_BADLANG (unset)
SHELL=/bin/bash
--
Ed Avis <eda@waniasset.com>
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
Thread Next
-
[perl #122853] Guarantee 0-9, A-Z, a-z character classes
by Ed Avis