develooper Front page | perl.perl5.porters | Postings from June 2017

Perl_sv_vcatpvfn_flags reworking

Thread Next
From:
Dave Mitchell
Date:
June 5, 2017 10:27
Subject:
Perl_sv_vcatpvfn_flags reworking
Message ID:
20170605102702.GG3083@iabyn.com
I've just pushed a branch, smoke-me/davem/sprintf, which contains
104 commits that heavily rework the Perl_sv_vcatpvfn_flags() function.

This sprawling 1700-line function provides the low-level sprintf
functionality to all the various other sprintf-ish perl and C functions in
core. It has grown piecewise over the years and has become a bit of a
mess. In particular it was hard to determine whether any buffer overflows
were lurking.

My branch, which I intend to merge into blead in a few days time, does
the following:

Bug fixes:

    Size modifiers with integer Inf/Nan, e.g. sprintf("%hi", Inf)
    used to print spurious warnings.

    The very-special-cased "%.NNNg" code has been removed, and is now
    handled by the  special-cased "...%.NNNg..." code instead, which makes
    it about 4% slower, but handles unicode radix points better.

    "%n" with a missing arg now croaks with a specific error message
    (previously it croaked about modifying a read-only variable).

    sprintf("%p", 0+Inf) now prints the address of an SV, not the literal
    string "Inf" (or "NaN").

    It now warns on a missing explicit width or vector arg, e.g. "%*2$d"
    or "%3$vd".

    Widths and precisions specified via an argument used to wrap; for
    example, this used to output "  abc"; it now croaks with "Integer
    overflow in format string":

        $w = 0x100000005; printf "%*s", $w, "abc";

    Illegal formats didn't work well with utf8. It now resets to the
    character following the '%' and continues from there.

    The locale-specific radix point code now uses
    SvCUR(PL_numeric_radix_sv) rather than SvLEN(PL_numeric_radix_sv) for
    buffer size allocation.  Also, several places weren't taking into
    account the length of the radix point string when calculating buffer
    sizes; they were only being saved by the general 'add 40' fudge
    factor.


Buffer sizing and length calculation overflow/wraps:

    The code has been thoroughly audited to look for any issues where the
    format and/or args are malicious, and some possible (although not
    exploitable) wraps have been fixed.

    int's and I32's etc have been replaced with STRLEN where appropriate.

    Tests for overflow/wrap have been added to the test suite.

    The code has been cleaned up a lot and better commented, so it will be
    easier to avoid wrapping in future.


Optimisations:

    The special-cased "...%.0f..." is now detected and handled earlier in
    the loop, so is about 35% faster.

    It now sets and restores locale only once per function call, not
    multiple times per format element.

    Generally the code has been heavily tweaked.
    
    On average, the newly-added benchmarks are about 10-15% faster; the
    mixed-utf8 benchmark is about about 80% faster.

Other changes:

    The HAS_LDBL_SPRINTF_BUG code that worked around a 2002 Irix 6
    bug has been removed.

    The 300-line block of code which handles %a hex floating-point formats
    has been moved into a separate static function.

    The 'svmax' arg to the sv_vcatpvfn() family has been changed from I32
    to STRLEN and renamed to 'sv_count'.

    A "Redundant argument in printf" warning is no longer emitted after an
    invalid % format element has been encountered. This makes the code
    simpler, and since you get an "Invalid conversion in printf" warning
    anyway, does no harm (and in fact may be less confusing).

    Several locals vars have been eliminated and the scope of many others
    has been reduced.


-- 
"Procrastination grows to fill the available time"
    -- Mitchell's corollary to Parkinson's Law

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About