develooper Front page | perl.perl5.porters | Postings from January 2012

[perl #108164] regex property extensions: \p{X-Confusable=A} from UTS#39

Thread Next
From:
tchrist1
Date:
January 13, 2012 07:27
Subject:
[perl #108164] regex property extensions: \p{X-Confusable=A} from UTS#39
Message ID:
rt-3.6.HEAD-14510-1326468414-884.108164-75-0@perl.org
# New Ticket Created by  tchrist1 
# Please include the string:  [perl #108164]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=108164 >


Currently, there is no (reasonable) way for the user to implement
properties like \p{X-Confusable=A} (that is, from UTS#39) on their own.

I feel this is a bug; hence, this filing.

Here are issues blocking the user-level implementation of such a scheme:

 *  The super-annoying new restriction that all user-defined properties *must*
    start with /^I[sn]/ for them to be paid any attention to.

 *  There is no way to have "parameterized" \p{NAME=VALUE} user properties, even
    when the NAME is an X-foo user name (let alone an X-VALUE user value for an
    existing property.) Consider whow X-Confusable=VALUE needs to be able to
    take at a minimum, an arbitrary code point, and in fact probably an
    arbitrary string, as its value.  

 *  Apropos locating user-defined properties, there may be concerns about which
    package the pattern was compiled in versus which one it is executed in,
    along with the related issue of serialization needed for qr// recompilation.

Because this is not possible for the user to do this for himself, I
necessarily request that it be fully implemented in the core for v5.18.

Currently only user-defined binary properties are allowed, which is not good
enough, because it's nuts to expect people to write a \p{Is_X-Confusable__A}
binary property or similar ridiculousness.  Even worse, you'd have to have a
special function for *EVERY POSSIBLE UNICODE CODE POINT*, and you could never 
do full strings.  You surely do not want a hundred thousand things in the 
symbol table -- or a million -- nor do you not want a hundred thousand little 
"XConfus" *.pl files, either.

Yes, that's asking a great deal, but we are given no choice: currently only
the core can do this because of these bugs related to user properties.

Therefore a perfectly reasonable alternative to implementing it in the core
is *TO MAKE IT POSSIBLE* for a user to implement it as a module outside the
core.  I would actually prefer that solution.  But right now, bugs get in
the way, so an in-core implementation tracking UTS#39 is the only way to do
this under current technology.

See http://stackoverflow.com/a/8841591/471272 for elaboration of the 
"confusable" issue and proposed property, including how this relates 
to UTS#39.

--tom

Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
   
  Platform:
    osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
    uname='openbsd chthon 4.4 generic#0 i386 '
    config_args='-des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (propolice)', gccosandvers='openbsd4.4'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-lgdbm -lm -lutil -lc
    perllibs=-lm -lutil -lc
    libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC  -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
                        USE_PERL_ATOF
  Built under openbsd
  Compiled at Jun 11 2011 11:48:28
  %ENV:
    PERL_UNICODE="SA"
  @INC:
    /usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/site_perl/5.14.0
    /usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/5.14.0
    /usr/local/lib/perl5/site_perl/5.12.3
    /usr/local/lib/perl5/site_perl/5.11.3
    /usr/local/lib/perl5/site_perl/5.10.1
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.0
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl/5.005
    /usr/local/lib/perl5/site_perl
    .


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About