develooper Front page | perl.perl5.porters | Postings from January 2016

Re: [perl #127288] [PATCH] I18N::Langinfo doesn't set UTF8 flag

Thread Previous
Karl Williamson
January 27, 2016 06:24
Re: [perl #127288] [PATCH] I18N::Langinfo doesn't set UTF8 flag
Message ID:
On 01/16/2016 05:15 AM, Niko Tyni (via RT) wrote:
> # New Ticket Created by  Niko Tyni
> # Please include the string:  [perl #127288]
> # in the subject line of all future correspondence about this issue.
> # <URL: >
> This is a bug report for perl from Niko Tyni <>,
> generated with the help of perlbug 1.40 running under perl 5.23.7.
> I18N::Langinfo::langinf() can return UTF-8 strings but doesn't set
> the UTF8 flag on them.
> LC_ALL=fr_FR.UTF-8 perl -MDevel::Peek -MI18N::Langinfo=langinfo,MON_12 -e 'Dump langinfo(MON_12())'
> SV = PV(0x1173b20) at 0x1172f30
>    REFCNT = 1
>    PV = 0x118e960 "d\303\251cembre"\0
>    CUR = 9
>    LEN = 11
> The attached somewhat clumsy set of two patches fixes this, but perhaps
> it's time to make something like __is_cur_LC_category_utf8() more visible?
> (This was prompted by Time-Format test suite starting to fail on Perl
> 5.22 in non-English locales, presumably since commit 9717af6d049902fc8
> which fixed POSIX::strftime() in a similar way. The Time-Format test
> suite compares POSIX::strftime() and I18N::Langinfo results. See
> )

I'm not sure about the best way to go about this.  I'm still reluctant 
to make is_cur_LC_category_utf8() publicly accessible.  But maybe it's 
been out there long enough to do so.  And I can't think of any way to do 
this in XS code without exposing something that's not documented.

(In pure perl, within the scope of 'use locale', you can do fc(\xdf) and 
see if the result is 'ss' or not, which it only will be if perl thinks 
the LC_CTYPE locale is UTF-8.)

So if someone has an opinion about this, chime in.

Here's a couple problems with your patch.

It is possible for the various locale categories to be in different 
locales.  For example,  LC_TIME could be in a UTF-8 locale, while 
LC_NUMERIC is not.  Thus a fully accurate program would need to know the 
category of the field being requested, and call 
is_cur_LC_category_utf8() with that category.  Most fields that 
nl_langinfo(3) returns on my Linux box are LC_TIME, but not all. So your 
patch could be wrong if a different field, like the radix character, is 
being requested.  Since the fields are not standardized, I don't know 
how to make it completely accurate, but one could know the few 
exceptions and assume everything else is LC_TIME.  There is a UTF-8 
locale on the dromedary machine that has a radix character that is only 
expressible in UTF-8.  And there are plenty of currency signs in 
LC_MONETARY that are UTF-8 as well.

For the .t file, looking in Configure for whether setlocale and POSIX, 
etc, are available or not is not the full story.   I have tried to 
standardize in the 5.23 series all finding of available locales to use 
functions from t/  One of them is locales_enabled() which 
knows about more things affecting if locale handling is available than 
any of the tests that I converted to use it. Since your test is in /ext, 
it can freely use core tools, and it's simpler to use this anyway.
> ---
> Flags:
>      category=library
>      severity=low
>      Type=Patch
>      PatchStatus=HasPatch
>      module=I18N::Langinfo
> ---
> Site configuration information for perl 5.23.7:
> Configured by niko at Sat Jan 16 13:32:56 EET 2016.
> Summary of my perl5 (revision 5 version 23 subversion 7) configuration:
>    Local Commit: 67271c83612f3ab129c8326d07ca55104a2f23f8
>    Ancestor: dff8a39d0194aa70bc091208b6a62bffe2148f0a
>    Platform:
>      osname=linux, osvers=4.3.0-1-amd64, archname=x86_64-linux
>      uname='linux estella 4.3.0-1-amd64 #1 smp debian 4.3.3-2 (2015-12-17) x86_64 gnulinux '
>      config_args='-Dusedevel -des'
>      hint=previous, useposix=true, d_sigaction=define
>      useithreads=undef, usemultiplicity=undef
>      use64bitint=define, use64bitall=define, uselongdouble=undef
>      usemymalloc=n, bincompat5005=undef
>    Compiler:
>      cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
>      optimize='-O2',
>      cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
>      ccversion='', gccversion='5.3.1 20151219', gccosandvers=''
>      intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
>      d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
>      ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
>      alignbytes=8, prototype=define
>    Linker and Libraries:
>      ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
>      libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib
>      libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
>      perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
>, so=so, useshrplib=false, libperl=libperl.a
>      gnulibc_version='2.21'
>    Dynamic Linking:
>      dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
>      cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
> Locally applied patches:
>      b8ea0feef314dd5432298db82d9f2b8afed1442f
>      67271c83612f3ab129c8326d07ca55104a2f23f8
> ---
> @INC for perl 5.23.7:
>      lib
>      /usr/local/lib/perl5/site_perl/5.23.7/x86_64-linux
>      /usr/local/lib/perl5/site_perl/5.23.7
>      /usr/local/lib/perl5/5.23.7/x86_64-linux
>      /usr/local/lib/perl5/5.23.7
>      .
> ---
> Environment for perl 5.23.7:
>      HOME=/home/niko
>      LANG=en_US.UTF-8
>      LANGUAGE (unset)
>      LC_CTYPE=fi_FI.UTF-8
>      LD_LIBRARY_PATH (unset)
>      LOGDIR (unset)
>      PATH=/home/niko/bin:/home/niko/bin:/home/niko/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/sbin:/usr/sbin
>      PERL_BADLANG (unset)
>      SHELL=/bin/zsh

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About