Front page | perl.perl5.porters |
Postings from May 2013
Re: Does stringification NVs should follow POSIX::setlocale?
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
May 12, 2013 19:45
Subject:
Re: Does stringification NVs should follow POSIX::setlocale?
Message ID:
20130512194445.GB3729@plum.flirble.org
On Thu, Apr 25, 2013 at 06:07:11PM -0700, Jan Dubois wrote:
> The code that turns an NV into a PV uses the Gconvert() macro. It will
> always use the current locale, regardless of any locale pragma being
> in effect or not, AFAICT.
>
> I guess this is the bug then. The problem of course is that saving
> the current locale, setting the locale to "C", converting the number,
> and then restoring the original locale is quite a bit of overhead for
> the regular use case that isn't using locale.
>
> So the simple call to Gconvert() in Perl_sv_2pv_flags() would then become:
>
> #ifdef USE_LOCALE_NUMERIC
> char *loc = savepv(setlocale(LC_NUMERIC, NULL));
> setlocale(LC_NUMERIC, "C");
> #endif
> Gconvert(SvNVX(sv), NV_DIG, 0, s);
> #ifdef USE_LOCALE_NUMERIC
> setlocale(LC_NUMERIC, loc);
> Safefree(loc);
> #endif
>
> That looks rather expensive to me.
It turns out that it's not.
On Sat, Apr 27, 2013 at 10:05:03AM +0200, Steffen Mueller wrote:
> On 04/26/2013 08:03 PM, Jan Dubois wrote:
> > I think this is undesirable, and implicit conversion between strings
> > and numbers should *always* use the "C" locale.
>
> Yes, please! Locales are just plain old crazy.
Here's the test code for the first 3 examples:
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv) {
int loop = 0;
char buffer[15 + 20];
if (argv[1] && !setlocale(LC_NUMERIC, argv[1])) {
perror("setlocale");
return 0;
}
do {
double d = loop + 0.5;
char *was = setlocale(LC_NUMERIC, NULL);
if (!was) {
perror("setlocale");
return 0;
}
if (was[0] == 'C' && was[1] == '\0') {
gcvt(d, 15, buffer);
} else {
if (!setlocale(LC_NUMERIC, "C")) {
perror("setlocale");
return 0;
}
gcvt(d, 15, buffer);
if (!setlocale(LC_NUMERIC, was)) {
perror("setlocale");
return 0;
}
}
} while (++loop < 10000000);
puts(buffer);
return 0;
}
default (no macros defined) - do what we do now, don't setlocale at all.
This demonstrates that we get different results:
[nicholas@dromedary-001 test]$ ./locale-default hr_HR.iso88592
9999999,5
[nicholas@dromedary-001 test]$ ./locale-default
9999999.5
always (-DALWAYS) - always setlocale to "C", and then back to what it was:
[nicholas@dromedary-001 test]$ ./locale-always
9999999.5
[nicholas@dromedary-001 test]$ ./locale-always hr_HR.iso88592
9999999.5
smarter (-DSMARTER) - always setlocale to "C", but only set it back if it
wasn't C before. Output as before.
smartest (attached code) - setlocale NULL to read the locale, and then
set/restore if it's not C. This one turns out to be most interesting. Output
as before.
dumbbench says
default Rounded run time per iteration: 6.767e+00 +/- 2.3e-02 (0.3%)
default (hr_HR) Rounded run time per iteration: 6.755e+00 +/- 1.0e-02 (0.1%)
always Rounded run time per iteration: 7.030e+00 +/- 1.7e-02 (0.2%)
always (hr_HR) Rounded run time per iteration: 7.039e+00 +/- 1.7e-02 (0.2%)
smarter Rounded run time per iteration: 7.006e+00 +/- 1.8e-02 (0.3%)
smarter (hr_HR) Rounded run time per iteration: 7.020e+00 +/- 1.3e-02 (0.2%)
smartest Rounded run time per iteration: 6.874e+00 +/- 1.8e-02 (0.3%)
smartest (hr_HR) Rounded run time per iteration: 1.9302e+01 +/- 1.9e-02 (0.1%)
So about 4% slower. Apart from the one that looks like it should be more
efficient, which can be 186% slower. I have no clue why.
Note that the numeric conversion is cached. This is only going to be a
performance hit for code which needs to format a lot of numbers. If it's
a problem, we can also get a speedup by using integer formatting if the
value is safely an integer.
If it's *still* a problem we might like to investigate the Grisu3 algorithm
from http://florian.loitsch.com/publications/dtoa-pldi2010.pdf
I believe that that has been coded for V8 and then extracted as
http://code.google.com/p/double-conversion/
It's something like 4 times as fast for the 99% of values which it can cope
with. And we can force it to ignore locales (if the code even deals with
them. I've not looked that hard)
It's in C++, and I believe a 3 clause BSD licence. We could convert it to C.
Also, Python has considered using this, and then rejected it as not worth
the effort: http://bugs.python.org/issue12450
There doesn't seem to be much difference between the two approaches on x86_64
Linux. I'm wondering what would be a useful place to benchmark to see if
there is a difference - Win32? I would have thought that avoiding the
second setlocale() would be a win somewhere. But it is fractionally more code.
Anyway, I think it's worth doing. Just which of the two to use.
Nicholas Clark
Thread Previous
|
Thread Next