Front page | perl.perl5.porters |
Postings from March 2016
What to do about locale and threads in 5.24, and beyond
Thread Next
From:
Karl Williamson
Date:
March 21, 2016 20:23
Subject:
What to do about locale and threads in 5.24, and beyond
Message ID:
56F057FE.9010701@khwilliamson.com
https://rt.perl.org/Ticket/Display.html?id=127708 has exposed a
fundamental flaw in the interaction of perl and ithreads, above and
beyond what the ticket describes. Tony Cook opened my eyes to the
problems that I hadn't realized before.
We warn people against changing the locale in threaded applications,
since setlocale() is a global which affects all threads immediately. So
we aren't responsible if someone ignores that advice.
The problem is that perl itself ignores that advice.
This appears to date back to the original design, though I, using a
similar paradigm, have extended the extent of doing this, and it is one
of those extensions that is failing in the ticket.
To give some background: Every C program has an underlying locale,
whether it is aware of it or not. The default is the C locale, but at
startup, the perl interpreter uses the environment to see if a different
locale is warranted, and if so, to set up that underlying locale. But
most perl statements get compiled into code that doesn't depend on the
locale at all. Only within the scope of 'use locale', or calling C
functions directly does the locale get noticed. This latter could
happen by calling, for example, POSIX::strftime(), or by using
backticks, qx, or system().
Recall that there are different locale categories, and that the locale
could be set up so that warning messages are in Spanish, while time
units are in Arabic, and monetary units in Chinese. Most often,
however, all categories will be in the same locale.
The numeric category is handled specially. Instead of keeping the
underlying locale in the one given by the environment upon set-up, or
changed to by the program calling POSIX::setlocale(), perl keeps it in
the C locale, while remembering what it really is meant to be. This is
because the radix character can be changed to something other than dot
in many locales, but the Perl language says the radix character is a
dot. So when parsing a floating point number, in 'eval' say, it will
see a dot and be happy, rather than seeing a comma (as in many European
locales) and croak. The locale is switched to the user-desired one only
briefly during certain operations, mainly printf. So, the comma (or
whatever) will be printed correctly according to the format.
Now imagine a threaded program that has a 'use locale', but doesn't
actually set locales, but the environment indicates that the terminal it
is being run in is a German one, which has a comma for the radix. Perl
has set up all categories but LC_NUMERIC to be German, and made
LC_NUMERIC be the C locale. Then one thread decides to print a floating
point number. The underlying locale for all threads will be switched by
perl to German. If control changes to another thread during the
interval of formatting up the text to print, before the locale is
switched back, that thread will now unexpectedly have the German locale,
and an eval (or various other things) will potentially fail. I am
unaware of any field reports of this actually happening.
Instead, the ticket is for something I coded, along the same lines.
Recall that all the other categories are kept in their underlying
locale, which is generally what you want. But when trying to stringify
an errno. outside the scope of 'use locale', it should be in English.
In that case perl changes the LC_MESSAGES locale back to C, gets the
text, and switches back. Again, if control is switched to another
thread during the interval while the locale is 'C', that thread will
have the wrong category, even though the program has itself not done
anything with locales. And in the ticket there are segfaults, which
seems to me to indicate that setlocale() is not thread safe on Darwin.
This is bad. A program not using locales, can get segfaults because the
core is doing locale stuff behind its back. It is more severe than the
LC_NUMERIC case, where the program has to have at least said it wanted
locales paid attention to, by 'use locale'.
Tony discovered that POSIX 2008 defines some additional locale handling
functions that can be applied at a thread (or even smaller) level. I
think, and I think he agrees, that the eventual solution is to convert
to use these on platforms that have them. Then, calling
POSIX::setlocale() would not actually call setlocale(), but instead
these thread-level functions. The implication would be that you could
not change the locale at all in another thread once it was started. I
consider that a bug fix. And the locale documentation would be changed
to indicate that on such platforms you can use locale on threads. One
could use Config to determine if one is operating on such a platform.
The short term solution appears to be to use mutexes to prevent control
from being transferred to another thread during the interval when the
locale is in flux. This would also be the final solution for platforms
that don't have the thread-level locale functions.
We need to do this in 5.24 (it is a blocker) for the case of a program
that nominally doesn't use locales at all. It is pretty uncommon to
stringify an errno, so performance shouldn't be an issue for the case in
the ticket.
A question I have is what else should go into 5.24.
Tony realized something that I and my predecessors hadn't, and that is,
when you query what a locale is, that locale can be changed out from
under you by another thread, and hence your subsequent calls that assume
that locale are invalid. I *think* that just querying can cause a
segfault on at least Darwin if another thread is manipulating the locale
at the same time. I am at a loss, otherwise, to explain the segfaults
we were seeing under some circumstances there.
14 hours ago, it seemed to me and Tony that mutexes were required even
for platforms that have the thread-level locale functions. Since then,
I believe I discovered a way around that.
I have done an audit of other places where there are potential issues.
Which of these should also go into 5.24?
cygwin changes locales when converting some UTF-8 stuff. I do not know
what releases of cygwin, if any, have the thread-level locale functions.
A fix for 5.24 could be just using the mutexes. I *suspect* that this
is done even for programs that don't ever do a 'use locale', so I think
this should go in 5.24, even though we have no reports about it. Who
knows? Some reported failures could actually be this bug.
Under 'use locale', there is sometimes the need to decide if the current
locale is a UTF-8 one. And in cases where the categories don't all have
the same locale, and the desired category isn't LC_CTYPE, the locale is
changed. It is relatively uncommon for the categories to have different
locales. I don't know about 5.24 for this.
Also under 'use locale', sometimes a warning message is printed when
it's discovered there are bugs in the locale definition. This is
extremely rare, so might not be worth putting in 5.24. On the other
hand, it is simple to do, and performance isn't an issue due to its rarity.
The most problematic case is for the LC_NUMERIC issue described above.
We have not gotten tickets to my knowledge attributable to this bug, and
this has been in the design from likely Day 1. On the other hand, the
design had been implemented incorrectly until I fixed it some releases
ago, and I don't remember if those fixes would have changed this or not.
And, also, the scope that the mutexes would be on is larger than the
other cases I've mentioned. Functions that do this are in the public
API, without any cautions about their needing to be of short-term use.
So someone could call them and set up a long term mutex. And you must
have done a 'use locale', I *think*, for there to be a problem.
And finally, It might be that on some platforms, just querying a locale
can lead to segfaults. I think that must be part of what is going on in
this ticket. My interpretation of something Tony said from reading the
glibc code, is that this shouldn't be an issue for that version of
setlocale(). We could put mutexes around querying locales, and maybe
setting. Performance might be an issue.
Thread Next
-
What to do about locale and threads in 5.24, and beyond
by Karl Williamson