develooper Front page | perl.perl5.porters | Postings from March 2016

What to do about locale and threads in 5.24, and beyond

Thread Next
Karl Williamson
March 21, 2016 20:23
What to do about locale and threads in 5.24, and beyond
Message ID: has exposed a 
fundamental flaw in the interaction of perl and ithreads, above and 
beyond what the ticket describes.  Tony Cook opened my eyes to the 
problems that I hadn't realized before.

We warn people against changing the locale in threaded applications, 
since setlocale() is a global which affects all threads immediately.  So 
we aren't responsible if someone ignores that advice.

The problem is that perl itself ignores that advice.

This appears to date back to the original design, though I, using a 
similar paradigm, have extended the extent of doing this, and it is one 
of those extensions that is failing in the ticket.

To give some background: Every C program has an underlying locale, 
whether it is aware of it or not.  The default is the C locale, but at 
startup, the perl interpreter uses the environment to see if a different 
locale is warranted, and if so, to set up that underlying locale.  But 
most perl statements get compiled into code that doesn't depend on the 
locale at all.  Only within the scope of 'use locale', or calling C 
functions directly does the locale get noticed.  This latter could 
happen by calling, for example, POSIX::strftime(), or by using 
backticks, qx, or system().

Recall that there are different locale categories, and that the locale 
could be set up so that warning messages are in Spanish, while time 
units are in Arabic, and monetary units in Chinese.  Most often, 
however, all categories will be in the same locale.

The numeric category is handled specially.  Instead of keeping the 
underlying locale in the one given by the environment upon set-up, or 
changed to by the program calling POSIX::setlocale(), perl keeps it in 
the C locale, while remembering what it really is meant to be.  This is 
because the radix character can be changed to something other than dot 
in many locales, but the Perl language says the radix character is a 
dot.  So when parsing a floating point number, in 'eval' say, it will 
see a dot and be happy, rather than seeing a comma (as in many European 
locales) and croak.  The locale is switched to the user-desired one only 
briefly during certain operations, mainly printf.  So, the comma (or 
whatever) will be printed correctly according to the format.

Now imagine a threaded program that has a 'use locale', but doesn't 
actually set locales, but the environment indicates that the terminal it 
is being run in is a German one, which has a comma for the radix.  Perl 
has set up all categories but LC_NUMERIC to be German, and made 
LC_NUMERIC be the C locale.  Then one thread decides to print a floating 
point number.  The underlying locale for all threads will be switched by 
perl to German.  If control changes to another thread during the 
interval of formatting up the text to print, before the locale is 
switched back, that thread will now unexpectedly have the German locale, 
and an eval (or various other things) will potentially fail.  I am 
unaware of any field reports of this actually happening.

Instead, the ticket is for something I coded, along the same lines. 
Recall that all the other categories are kept in their underlying 
locale, which is generally what you want.  But when trying to stringify 
an errno. outside the scope of 'use locale', it should be in English. 
In that case perl changes the LC_MESSAGES locale back to C, gets the 
text, and switches back.  Again, if control is switched to another 
thread during the interval while the locale is 'C', that thread will 
have the wrong category, even though the program has itself not done 
anything with locales.  And in the ticket there are segfaults, which 
seems to me to indicate that setlocale() is not thread safe on Darwin.

This is bad.  A program not using locales, can get segfaults because the 
core is doing locale stuff behind its back.  It is more severe than the 
LC_NUMERIC case, where the program has to have at least said it wanted 
locales paid attention to, by 'use locale'.

Tony discovered that POSIX 2008 defines some additional locale handling 
functions that can be applied at a thread (or even smaller) level.  I 
think, and I think he agrees, that the eventual solution is to convert 
to use these on platforms that have them.  Then, calling 
POSIX::setlocale() would not actually call setlocale(), but instead 
these thread-level functions.  The implication would be that you could 
not change the locale at all in another thread once it was started.  I 
consider that a bug fix.  And the locale documentation would be changed 
to indicate that on such platforms you can use locale on threads.  One 
could use Config to determine if one is operating on such a platform.

The short term solution appears to be to use mutexes to prevent control 
from being transferred to another thread during the interval when the 
locale is in flux.  This would also be the final solution for platforms 
that don't have the thread-level locale functions.

We need to do this in 5.24 (it is a blocker) for the case of a program 
that nominally doesn't use locales at all.  It is pretty uncommon to 
stringify an errno, so performance shouldn't be an issue for the case in 
the ticket.

A question I have is what else should go into 5.24.

Tony realized something that I and my predecessors hadn't, and that is, 
when you query what a locale is, that locale can be changed out from 
under you by another thread, and hence your subsequent calls that assume 
that locale are invalid.  I *think* that just querying can cause a 
segfault on at least Darwin if another thread is manipulating the locale 
at the same time.  I am at a loss, otherwise, to explain the segfaults 
we were seeing under some circumstances there.

14 hours ago, it seemed to me and Tony that mutexes were required even 
for platforms that have the thread-level locale functions.  Since then, 
I believe I discovered a way around that.

I have done an audit of other places where there are potential issues.

Which of these should also go into 5.24?

cygwin changes locales when converting some UTF-8 stuff.  I do not know 
what releases of cygwin, if any, have the thread-level locale functions. 
  A fix for 5.24 could be just using the mutexes.  I *suspect* that this 
is done even for programs that don't ever do a 'use locale', so I think 
this should go in 5.24, even though we have no reports about it.  Who 
knows?  Some reported failures could actually be this bug.

Under 'use locale', there is sometimes the need to decide if the current 
locale is a UTF-8 one.  And in cases where the categories don't all have 
the same locale, and the desired category isn't LC_CTYPE, the locale is 
changed.  It is relatively uncommon for the categories to have different 
locales.  I don't know about 5.24 for this.

Also under 'use locale', sometimes a warning message is printed when 
it's discovered there are bugs in the locale definition.  This is 
extremely rare, so might not be worth putting in 5.24.  On the other 
hand, it is simple to do, and performance isn't an issue due to its rarity.

The most problematic case is for the LC_NUMERIC issue described above. 
We have not gotten tickets to my knowledge attributable to this bug, and 
this has been in the design from likely Day 1.  On the other hand, the 
design had been implemented incorrectly until I fixed it some releases 
ago, and I don't remember if those fixes would have changed this or not. 
  And, also, the scope that the mutexes would be on is larger than the 
other cases I've mentioned.  Functions that do this are in the public 
API, without any cautions about their needing to be of short-term use. 
So someone could call them and set up a long term mutex.  And you must 
have done a 'use locale', I *think*, for there to be a problem.

And finally, It might be that on some platforms, just querying a locale 
can lead to segfaults.  I think that must be part of what is going on in 
this ticket.  My interpretation of something Tony said from reading the 
glibc code, is that this shouldn't be an issue for that version of 
setlocale().  We could put mutexes around querying locales, and maybe 
setting.  Performance might be an issue.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About