develooper Front page | perl.perl5.porters | Postings from April 2013

[perl #117787] use locale;" breaks \w on matching c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales

Thread Previous | Thread Next
From:
Dominic Hargreaves
Date:
April 28, 2013 17:23
Subject:
[perl #117787] use locale;" breaks \w on matching c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales
Message ID:
rt-3.6.HEAD-28177-1367169767-1915.117787-75-0@perl.org
# New Ticket Created by  Dominic Hargreaves 
# Please include the string:  [perl #117787]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=117787 >



This is a bug report for perl from dom@earth.li,
generated with the help of perlbug 1.39 running under perl 5.17.12.

>From <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=529305>:

----------------
Showcase:
(requires installing tr_TR.utf8 and de_De.utf8 locales via 'dpkg-reconfigure
locales' or installing locales-all package)

 #/usr/bin/perl
 use strict;
 use warnings;
 use POSIX qw(setlocale LC_ALL);
 setlocale(LC_ALL, "tr_TR.utf8");
 print "Locale is ", setlocale(LC_ALL), "\n";

 use locale;
 use utf8;
 binmode STDOUT, ":utf8";

 print "$_ is " . ( /\w/ ? "" : "not " ) . "a word character\n"
    for qw( ç ö ş ü ğ ı İ );

The output is

 Locale is tr_TR.utf8
 ç is not a word character
 ö is not a word character
 ş is a word character
 ü is not a word character
 ğ is a word character
 ı is a word character
 İ is a word character

Looking (with my uneducated eyes) in /usr/share/i18n/locales/tr_TR it seems
that at least c-cedilla (U00E7 in small caps and U00C7 in caps) shall be
treated as an "alpha" character so the problem seems to be in perl's
interpretation.
----------------

This is reproducible with 8b3945e7b7b7ae6fd2369864ebe169bd9a91cf4e
(current blead) and has been the case since at least 5.8.8.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=library
    severity=low
    module=locale
---
Site configuration information for perl 5.17.12:

Configured by dom at Sun Apr 28 17:39:32 BST 2013.

Summary of my perl5 (revision 5 version 17 subversion 12) configuration:
  Commit id: 8b3945e7b7b7ae6fd2369864ebe169bd9a91cf4e
  Platform:
    osname=linux, osvers=3.2.0-4-686-pae, archname=i686-linux-thread-multi-64int
    uname='linux callisto 3.2.0-4-686-pae #1 smp debian 3.2.41-2 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Dldflags=-Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Duse64bitint -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -des -Dusedevel'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-z,relro -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/local/lib/perl5/5.17.12/i686-linux-thread-multi-64int/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro -L/usr/local/lib -fstack-protector'

Locally applied patches:
    

---
@INC for perl 5.17.12:
    lib
    /usr/local/lib/perl5/site_perl/5.17.12/i686-linux-thread-multi-64int
    /usr/local/lib/perl5/site_perl/5.17.12
    /usr/local/lib/perl5/5.17.12/i686-linux-thread-multi-64int
    /usr/local/lib/perl5/5.17.12
    .

---
Environment for perl 5.17.12:
    HOME=/home/dom
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/home/dom/working/perl:
    LOGDIR (unset)
    PATH=~/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About