Front page | perl.perl5.porters |
Postings from January 2012
[perl #108164] regex property extensions: \p{X-Confusable=A} from UTS#39
Thread Next
From:
tchrist1
Date:
January 13, 2012 07:27
Subject:
[perl #108164] regex property extensions: \p{X-Confusable=A} from UTS#39
Message ID:
rt-3.6.HEAD-14510-1326468414-884.108164-75-0@perl.org
# New Ticket Created by tchrist1
# Please include the string: [perl #108164]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=108164 >
Currently, there is no (reasonable) way for the user to implement
properties like \p{X-Confusable=A} (that is, from UTS#39) on their own.
I feel this is a bug; hence, this filing.
Here are issues blocking the user-level implementation of such a scheme:
* The super-annoying new restriction that all user-defined properties *must*
start with /^I[sn]/ for them to be paid any attention to.
* There is no way to have "parameterized" \p{NAME=VALUE} user properties, even
when the NAME is an X-foo user name (let alone an X-VALUE user value for an
existing property.) Consider whow X-Confusable=VALUE needs to be able to
take at a minimum, an arbitrary code point, and in fact probably an
arbitrary string, as its value.
* Apropos locating user-defined properties, there may be concerns about which
package the pattern was compiled in versus which one it is executed in,
along with the related issue of serialization needed for qr// recompilation.
Because this is not possible for the user to do this for himself, I
necessarily request that it be fully implemented in the core for v5.18.
Currently only user-defined binary properties are allowed, which is not good
enough, because it's nuts to expect people to write a \p{Is_X-Confusable__A}
binary property or similar ridiculousness. Even worse, you'd have to have a
special function for *EVERY POSSIBLE UNICODE CODE POINT*, and you could never
do full strings. You surely do not want a hundred thousand things in the
symbol table -- or a million -- nor do you not want a hundred thousand little
"XConfus" *.pl files, either.
Yes, that's asking a great deal, but we are given no choice: currently only
the core can do this because of these bugs related to user properties.
Therefore a perfectly reasonable alternative to implementing it in the core
is *TO MAKE IT POSSIBLE* for a user to implement it as a module outside the
core. I would actually prefer that solution. But right now, bugs get in
the way, so an in-core implementation tracking UTS#39 is the only way to do
this under current technology.
See http://stackoverflow.com/a/8841591/471272 for elaboration of the
"confusable" issue and proposed property, including how this relates
to UTS#39.
--tom
Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
Platform:
osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
uname='openbsd chthon 4.4 generic#0 i386 '
config_args='-des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='3.3.5 (propolice)', gccosandvers='openbsd4.4'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-lgdbm -lm -lutil -lc
perllibs=-lm -lutil -lc
libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC -L/usr/local/lib -fstack-protector'
Characteristics of this binary (from libperl):
Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
USE_PERL_ATOF
Built under openbsd
Compiled at Jun 11 2011 11:48:28
%ENV:
PERL_UNICODE="SA"
@INC:
/usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
/usr/local/lib/perl5/site_perl/5.14.0
/usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
/usr/local/lib/perl5/5.14.0
/usr/local/lib/perl5/site_perl/5.12.3
/usr/local/lib/perl5/site_perl/5.11.3
/usr/local/lib/perl5/site_perl/5.10.1
/usr/local/lib/perl5/site_perl/5.10.0
/usr/local/lib/perl5/site_perl/5.8.7
/usr/local/lib/perl5/site_perl/5.8.0
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl/5.005
/usr/local/lib/perl5/site_perl
.
Thread Next
-
[perl #108164] regex property extensions: \p{X-Confusable=A} from UTS#39
by tchrist1