[perl #117787] use locale;" breaks \w on matching c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales

Dominic Hargreaves
April 28, 2013 17:23
[perl #117787] use locale;" breaks \w on matching c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales
Message ID:
# New Ticket Created by  Dominic Hargreaves 
# Please include the string:  [perl #117787]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.17.12.

>From <>:

(requires installing tr_TR.utf8 and de_De.utf8 locales via 'dpkg-reconfigure
locales' or installing locales-all package)

 use strict;
 use warnings;
 use POSIX qw(setlocale LC_ALL);
 setlocale(LC_ALL, "tr_TR.utf8");
 print "Locale is ", setlocale(LC_ALL), "\n";

 use locale;
 use utf8;
 binmode STDOUT, ":utf8";

 print "$_ is " . ( /\w/ ? "" : "not " ) . "a word character\n"
    for qw( ç ö ş ü ğ ı İ );

The output is

 Locale is tr_TR.utf8
 ç is not a word character
 ö is not a word character
 ş is a word character
 ü is not a word character
 ğ is a word character
 ı is a word character
 İ is a word character

Looking (with my uneducated eyes) in /usr/share/i18n/locales/tr_TR it seems
that at least c-cedilla (U00E7 in small caps and U00C7 in caps) shall be
treated as an "alpha" character so the problem seems to be in perl's

This is reproducible with 8b3945e7b7b7ae6fd2369864ebe169bd9a91cf4e
(current blead) and has been the case since at least 5.8.8.

[Please do not change anything below this line]
Site configuration information for perl 5.17.12:

Configured by dom at Sun Apr 28 17:39:32 BST 2013.

Summary of my perl5 (revision 5 version 17 subversion 12) configuration:
  Commit id: 8b3945e7b7b7ae6fd2369864ebe169bd9a91cf4e
    osname=linux, osvers=3.2.0-4-686-pae, archname=i686-linux-thread-multi-64int
    uname='linux callisto 3.2.0-4-686-pae #1 smp debian 3.2.41-2 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Dldflags=-Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Duse64bitint -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -des -Dusedevel'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-z,relro -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/local/lib/perl5/5.17.12/i686-linux-thread-multi-64int/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro -L/usr/local/lib -fstack-protector'

Locally applied patches:

@INC for perl 5.17.12:

Environment for perl 5.17.12:
    LANGUAGE (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

