develooper Front page | perl.perl5.porters | Postings from May 2013

Re: Does stringification NVs should follow POSIX::setlocale?

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
May 12, 2013 19:45
Subject:
Re: Does stringification NVs should follow POSIX::setlocale?
Message ID:
20130512194445.GB3729@plum.flirble.org
On Thu, Apr 25, 2013 at 06:07:11PM -0700, Jan Dubois wrote:
> The code that turns an NV into a PV uses the Gconvert() macro. It will
> always use the current locale, regardless of any locale pragma being
> in effect or not, AFAICT.
> 
> I guess this is the bug then.  The problem of course is that saving
> the current locale, setting the locale to "C", converting the number,
> and then restoring the original locale is quite a bit of overhead for
> the regular use case that isn't using locale.
> 
> So the simple call to Gconvert() in Perl_sv_2pv_flags() would then become:
> 
> #ifdef USE_LOCALE_NUMERIC
>     char *loc = savepv(setlocale(LC_NUMERIC, NULL));
>     setlocale(LC_NUMERIC, "C");
> #endif
>        Gconvert(SvNVX(sv), NV_DIG, 0, s);
> #ifdef USE_LOCALE_NUMERIC
>     setlocale(LC_NUMERIC, loc);
>     Safefree(loc);
> #endif
> 
> That looks rather expensive to me.

It turns out that it's not.


On Sat, Apr 27, 2013 at 10:05:03AM +0200, Steffen Mueller wrote:
> On 04/26/2013 08:03 PM, Jan Dubois wrote:
> > I think this is undesirable, and implicit conversion between strings
> > and numbers should *always* use the "C" locale.
> 
> Yes, please! Locales are just plain old crazy.

Here's the test code for the first 3 examples:

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv) {
    int loop = 0;
    char buffer[15 + 20];
    if (argv[1] && !setlocale(LC_NUMERIC, argv[1])) {
	perror("setlocale");
	return 0;
    }
    do {
	double d = loop + 0.5;
	char *was = setlocale(LC_NUMERIC, NULL);

	if (!was) {
	    perror("setlocale");
	    return 0;
	}

	if (was[0] == 'C' && was[1] == '\0') {
	    gcvt(d, 15, buffer);
	} else {
	    if (!setlocale(LC_NUMERIC, "C")) {
		perror("setlocale");
		return 0;
	    }

	    gcvt(d, 15, buffer);

	    if (!setlocale(LC_NUMERIC, was)) {
		perror("setlocale");
		return 0;
	    }
	}
    } while (++loop < 10000000);
    puts(buffer);
    return 0;
}


default (no macros defined) - do what we do now, don't setlocale at all.
This demonstrates that we get different results:
[nicholas@dromedary-001 test]$ ./locale-default  hr_HR.iso88592
9999999,5
[nicholas@dromedary-001 test]$ ./locale-default
9999999.5

always (-DALWAYS) - always setlocale to "C", and then back to what it was:
[nicholas@dromedary-001 test]$ ./locale-always
9999999.5
[nicholas@dromedary-001 test]$ ./locale-always  hr_HR.iso88592
9999999.5

smarter (-DSMARTER) - always setlocale to "C", but only set it back if it
wasn't C before. Output as before.

smartest (attached code) - setlocale NULL to read the locale, and then
set/restore if it's not C. This one turns out to be most interesting. Output
as before.

dumbbench says

default           Rounded run time per iteration: 6.767e+00 +/- 2.3e-02 (0.3%)
default (hr_HR)   Rounded run time per iteration: 6.755e+00 +/- 1.0e-02 (0.1%)
always            Rounded run time per iteration: 7.030e+00 +/- 1.7e-02 (0.2%)
always (hr_HR)    Rounded run time per iteration: 7.039e+00 +/- 1.7e-02 (0.2%)
smarter           Rounded run time per iteration: 7.006e+00 +/- 1.8e-02 (0.3%)
smarter (hr_HR)   Rounded run time per iteration: 7.020e+00 +/- 1.3e-02 (0.2%)
smartest          Rounded run time per iteration: 6.874e+00 +/- 1.8e-02 (0.3%)
smartest (hr_HR)  Rounded run time per iteration: 1.9302e+01 +/- 1.9e-02 (0.1%)


So about 4% slower. Apart from the one that looks like it should be more
efficient, which can be 186% slower. I have no clue why.

Note that the numeric conversion is cached. This is only going to be a
performance hit for code which needs to format a lot of numbers. If it's
a problem, we can also get a speedup by using integer formatting if the
value is safely an integer.

If it's *still* a problem we might like to investigate the Grisu3 algorithm
from http://florian.loitsch.com/publications/dtoa-pldi2010.pdf

I believe that that has been coded for V8 and then extracted as
http://code.google.com/p/double-conversion/
It's something like 4 times as fast for the 99% of values which it can cope
with. And we can force it to ignore locales (if the code even deals with
them. I've not looked that hard)

It's in C++, and I believe a 3 clause BSD licence. We could convert it to C.

Also, Python has considered using this, and then rejected it as not worth
the effort: http://bugs.python.org/issue12450

There doesn't seem to be much difference between the two approaches on x86_64
Linux. I'm wondering what would be a useful place to benchmark to see if
there is a difference - Win32? I would have thought that avoiding the
second setlocale() would be a win somewhere. But it is fractionally more code.

Anyway, I think it's worth doing. Just which of the two to use.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About