develooper Front page | perl.perl5.porters | Postings from August 2001

oct() and hex()

Thread Next
Nicholas Clark
August 31, 2001 15:14
oct() and hex()
Message ID:
With pp_divide and pp_modulo now preserving UVs when UVs are larger than the
NV mantissa, as far as I know the only 3 operators left that lose bits via
NVs are oct, hex and unpack "%64". For example:

perl -le 'printf "%x\n", $_ foreach (0x123456789abcdef, hex "123456789abcdef")'

Underlying hex and oct are 3 functions, scan_bin, scan_oct, and scan_hex that
currently return their result in an NV. They are prototyped like this:

Perl_scan_bin(pTHX_ char *start, STRLEN len, STRLEN *retlen)
Perl_scan_oct(pTHX_ char *start, STRLEN len, STRLEN *retlen)
Perl_scan_hex(pTHX_ char *start, STRLEN len, STRLEN *retlen)

and are currently virtually undocumented, apart from this cross reference
in perlclib.pod:

    Notice also the C<scan_bin>, C<scan_hex>, and C<scan_oct> functions in
    F<util.c> for converting strings representing numbers in the respective
    bases into C<NV>s.

[patch at end, as they are now in numeric.c]

The calling convention is fairly self explanatory, except for retlen which
is expected to be 1 to allow underscores in the number, 0 to disallow.

Typical calling convention (eg regcomp.c) is:

				numlen = 1;	/* allow underscores */
				ender = (UV)scan_hex(p + 1, e - p - 1, &numlen);

and actually all calls apart from pp_oct and pp_hex currently immediately cast
back to UV. Worth noting is that the (UV) cast is undefined behaviour for any
NV >= (UV_MAX + 1) or <= -1, which is relevant below.

The requirement seems to be for scanning functions that take a pointer,length
pair to define a region of memory to scan, flags (currently only underscores
(dis)?allowed) and return a number. Currently the number is returned as an NV.
I'm proposing to provide three new scanning functions (names?) to return the
result either as an NV or a UV. This will actually avoid the undefined
behaviour for pathologically long hex values on platforms where
sizeof(UV) >= sizeof(NV). scan_hex etc will be remain for binary
compatibility, and will be implemented will calls to the three new functions.

The current internal implementation actually starts off using UVs, and flips
over to NVs if the UVs overflow, casting the UV to NV on return if it has
not overflowed.

I'm proposing an API like this:

Perl_grok_hex(pTHX_ char *start, STRLEN *len, I32 *flags, NV *result)

  start is the address to scan (as before)
  *len is the length to scan (as before, but now passed as a pointer)
  *flags are flags to affect the scan (currently only underscores)
  result is a pointer to NV or NULL.

  *len is the length of scanned string (currently retlen)
  *flags are result flags (initially only result_overflowed_uv)
  *result is the value, only if non_null and result_overflowed_uv is true

and the function returns the scanned number (if it did not overflow) or
UV_MAX if result_overflowed_uv is true.
[UV_MAX is equivalent to one possibility of what undefined behaviour of the (UV)
cast in regcomp.c returns. (SIGFPE is another...)]

the regcomp.c code becomes

  numlen = e - p - 1;
  ender = grok_hex(p + 1, &numlen, &flags, NULL);

which preserves the current semantics of not actually caring if the hex value
overflows a UV.

pp_hex becomes

    dSP; dTARGET;
    char *tmps;
    STRLEN len;
    NV resultd;
    UV resultu;

    tmps = (SvPVx(POPs, len));
    resultu = grok_hex (tmps, &len, &flags, &resultd);
    if (flags & PERL_SCAN_GREATER_THAN_UV_MAX) {
    else {

and backwards compatibility is via

Perl_scan_hex(pTHX_ char *start, STRLEN len, STRLEN *retlen)
    NV rnv;
    I32 flags = *retlen ? PERL_SCAN_ALLOW_UNDERSCORES : 0;
    UV ruv = grok_hex (start, &len, &flags, &rnv);

    *retlen = len;
    return (flags & PERL_SCAN_GREATER_THAN_UV_MAX) ? rnv : (NV)ruv;

[in worst Usenet tradition none of the above has been run through a compiler.]

Comments? Suggestions?

Nicholas Clark

--- pod/perlclib.pod.orig       Tue Feb 13 02:30:12 2001
+++ pod/perlclib.pod    Fri Aug 31 22:39:23 2001
@@ -166,7 +166,7 @@
     strtoul(s, *p, n)           Strtoul(s, *p, n)
 Notice also the C<scan_bin>, C<scan_hex>, and C<scan_oct> functions in
-F<util.c> for converting strings representing numbers in the respective
+F<numeric.c> for converting strings representing numbers in the respective
 bases into C<NV>s.
 In theory C<Strtol> and C<Strtoul> may not be defined if the machine perl is

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About