develooper Front page | perl.perl5.porters | Postings from May 2013

[perl #118197] perl doesn't cope with non-ASCII decimal separators

From:
Nicholas Clark
Date:
May 27, 2013 13:13
Subject:
[perl #118197] perl doesn't cope with non-ASCII decimal separators
Message ID:
rt-3.6.HEAD-2650-1369660396-470.118197-75-0@perl.org
# New Ticket Created by  Nicholas Clark 
# Please include the string:  [perl #118197]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=118197 >


Perl doesn't cope with non-ASCII decimal separators.

LC_ALL=ps_AF.utf8 ./perl -Ilib -MPOSIX=:locale_h -C63 -MDevel::Peek -e 'setlocale(LC_ALL, "ps_AF.utf8"); use locale; Dump (sprintf "%g\n", 3.14)'
SV = PV(0xa1af6f8) at 0xa1c85f8
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0xa1cc668 "3\331\25314\n"\0
  CUR = 6
  LEN = 12

The decimal separator in "Pashto locale for Afghanistan" is U+066B,
ARABIC DECIMAL SEPARATOR

So that's Devel::Peek showing that we have mojibake.

It isn't clear *how* to fix this. It might need another -C flag to say
"believe that the strings in the locales are in UTF-8".

If this approach makes sense, we probably should do something similar for the
strings returned by strerror()

Strangely fa_IR.utf8 now seems to use '.' as the decimal separator.
About a decade ago we identified it as being an interesting troublesome
test case as at that time it was using a multibyte separator. (Presumably
U+066B, but I don't know for sure)


Despite this bug, all tests pass in the local ps_AF.utf8

Nicholas Clark




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About