develooper Front page | perl.perl5.porters | Postings from July 2008

SupernummifragilisticXPialidocious (was: [perl #2783] Security of ARGV using 2-argument open)

From:
Tom Christiansen
Date:
July 27, 2008 16:42
Subject:
SupernummifragilisticXPialidocious (was: [perl #2783] Security of ARGV using 2-argument open)
Message ID:
19382.1217202125@chthon
In-Reply-To: Message from Zefram <zefram@fysh.org>
   of "Sat, 26 Jul 2008 12:03:14 BST." <20080726110314.GY6303@fysh.org>

> Mark Mielke wrote:

>> I don't understand. What is /$/ reasonably supposed to do?

> It is very frequently used in an attempt to anchor to the end of the
> string.  Such as

>       die "invalid keyword" unless $keyword =~ /^(foo|bar|baz)$/;

> In this context the programmer usually doesn't intend to accept "foo\n".
> Newline is a shell metacharacter, of course, and often significant in
> file formats, so there's lots of scope for breakage.

>> Please point out a real-life
>> program that uses /$/ that isn't mere theory.

Well, I once had to fix File::Find and many and various other standard 
filename-related modules because they nigheevely used /./ when they 
needed /./s and /$/ when they needed /\z/, causing them to break on 
certain directories with newlines in them.

Still, what can you do?  People make ugly things, and nothing tells 
them not to.  Due to just exactly what Johan has mentioned, I've 
trained myself to pretty much always run:

    % find DIR .... -print0 | xargs -0 ....

instead of 

    % find DIR .... -print  | xargs    ....

> I presume you mean a real-life program that misbehaves due to misuse
> of /$/.  On a quick look through /usr/local/share/perl, I found this
> in Carp::Clan:

> : unless ( /^-?\d+(?:\.\d+(?:[eE][+-]\d+)?)?$/
> :     )    # Looks numeric
> : {
> :     s/([\\\'])/\\$1/g;    # Escape \ and '
> ...

> This is used when displaying function arguments in a stack trace: it's
> trying to show numeric values as unquoted numbers and any other defined
> value as a quoted string.  So these are how it's meant to work:

> $ perl -MCarp::Clan=confess -we 'sub foo { confess "a" } foo("abc")'
> Carp::Clan::__ANON__(): a at -e line 1
>         main::foo('abc') called at -e line 1
> $ perl -MCarp::Clan=confess -we 'sub foo { confess "a" } foo("abc\n")'
> Carp::Clan::__ANON__(): a at -e line 1
>         main::foo('abc\x0A') called at -e line 1
> $ perl -MCarp::Clan=confess -we 'sub foo { confess "a" } foo("123")'
> Carp::Clan::__ANON__(): a at -e line 1
>         main::foo(123) called at -e line 1
> $

> And this one goes wrong:

> $ perl -MCarp::Clan=confess -we 'sub foo { confess "a" } foo("123\n")'
> Carp::Clan::__ANON__(): a at -e line 1
>         main::foo(123
> ) called at -e line 1
> $

> OK, this one's not likely to produce a security hole, but it's just the
> first instance I set eyes on.  It's a very common antipattern.

Antipattern?

Anyway, /^-?\d+(?:\.\d+(?:[eE][+-]\d+)?)?$/ isn't the only thing
that gets this "wrong".  Perl considers a "123\n" a number fine
and proper, emitting nary a peep upon its consideration as such.

    % perl -WE 'say "123\n" * 2'
    246

Do you somehow disagree?  Perhaps 0+ should be added to Carp::Clan.

And lest you think Perl's completely bonkers, it doesn't particularly
like (nor, unfortunately, altogether dislike) other gunk following
after that newline:

    % perl -WE 'say "123 men\n" * 2'
    Argument "123 men\n" isn't numeric in multiplication (*) at -e line 1.
    246

The limits of this can be readily demonstrated with the following
little program.  But it gives very different answers and kvetches
depending on how you run it.

    use strict;
    use warnings;
    use Scalar::Util qw[ looks_like_number ];
    $| = 1;
    my @strings = ( "123", "123\n",
                    "1", "1 ", " 1", " 1 ", " 1 \n",
                    "0", "+0", "-0", "00", "00.00",
                    "Inf", "-Inf", "+Inf",
                    "NaN", "nAn", "Nan ",
                    "6.02e23", "9e555",
                    "037", "0xFF",
                    "3 blind mice",
                );
    for my $str (@strings) {
        printf("%-15s looks %s a number (nummishly %s)\n",
               qq<"$str">,
                      looks_like_number($str)
                          ? "like"
                          : "UNLIKE",
                                               (0+$str),
              );
    }

First, the output when run through perl5.8.7; note the dubious
operations upon NaN and Inf, and the useful warnings.

    "123"           looks like a number (nummishly 123)
    Newline in left-justified string for printf at /tmp/num line 19.
    "123
    "          looks like a number (nummishly 123)
    "1"             looks like a number (nummishly 1)
    "1 "            looks like a number (nummishly 1)
    " 1"            looks like a number (nummishly 1)
    " 1 "           looks like a number (nummishly 1)
    Newline in left-justified string for printf at /tmp/num line 19.
    " 1
    "          looks like a number (nummishly 1)
    "0"             looks like a number (nummishly 0)
    "+0"            looks like a number (nummishly 0)
    "-0"            looks like a number (nummishly 0)
    "00"            looks like a number (nummishly 0)
    "00.00"         looks like a number (nummishly 0)
    "Inf"           looks like a number (nummishly 0)
    "-Inf"          looks like a number (nummishly 0)
    "+Inf"          looks like a number (nummishly 0)
    "NaN"           looks like a number (nummishly 0)
    "nAn"           looks like a number (nummishly 0)
    "Nan "          looks like a number (nummishly 0)
    "6.02e23"       looks like a number (nummishly 6.02e+23)
    "9e555"         looks like a number (nummishly Inf)
    "037"           looks like a number (nummishly 37)
    Argument "0xFF" isn't numeric in addition (+) at /tmp/num line 19.
    "0xFF"          looks UNLIKE a number (nummishly 0)
    Argument "3 blind mice" isn't numeric in addition (+) at /tmp/num line 19.
    "3 blind mice"  looks UNLIKE a number (nummishly 3)

Now the output when run through perl5.10.0; note the absence
of the now-missing warning about newlines:

    "123"           looks like a number (nummishly 123)
    "123
    "          looks like a number (nummishly 123)
    "1"             looks like a number (nummishly 1)
    "1 "            looks like a number (nummishly 1)
    " 1"            looks like a number (nummishly 1)
    " 1 "           looks like a number (nummishly 1)
    " 1
    "          looks like a number (nummishly 1)
    "0"             looks like a number (nummishly 0)
    "+0"            looks like a number (nummishly 0)
    "-0"            looks like a number (nummishly 0)
    "00"            looks like a number (nummishly 0)
    "00.00"         looks like a number (nummishly 0)
    "Inf"           looks like a number (nummishly 0)
    "-Inf"          looks like a number (nummishly 0)
    "+Inf"          looks like a number (nummishly 0)
    "NaN"           looks like a number (nummishly 0)
    "nAn"           looks like a number (nummishly 0)
    "Nan "          looks like a number (nummishly 0)
    "6.02e23"       looks like a number (nummishly 6.02e+23)
    "9e555"         looks like a number (nummishly Inf)
    "037"           looks like a number (nummishly 37)
    Argument "0xFF" isn't numeric in addition (+) at /tmp/num line 19.
    "0xFF"          looks UNLIKE a number (nummishly 0)
    Argument "3 blind mice" isn't numeric in addition (+) at /tmp/num line 19.
    "3 blind mice"  looks UNLIKE a number (nummishly 3)

Finally, here's the output when run through perl5.10.0 -Mbignum,
where some things improve:

    "123"           looks like a number (nummishly 123)
    "123
    "          looks like a number (nummishly 123)
    "1"             looks like a number (nummishly 1)
    "1 "            looks like a number (nummishly 1)
    " 1"            looks like a number (nummishly 1)
    " 1 "           looks like a number (nummishly 1)
    " 1
    "          looks like a number (nummishly 1)
    "0"             looks like a number (nummishly 0)
    "+0"            looks like a number (nummishly 0)
    "-0"            looks like a number (nummishly 0)
    "00"            looks like a number (nummishly 0)
    "00.00"         looks like a number (nummishly 0)
    "Inf"           looks like a number (nummishly NaN)
    "-Inf"          looks like a number (nummishly NaN)
    "+Inf"          looks like a number (nummishly NaN)
    "NaN"           looks like a number (nummishly NaN)
    "nAn"           looks like a number (nummishly NaN)
    "Nan "          looks like a number (nummishly NaN)
    "6.02e23"       looks like a number (nummishly 602000000000000000000000)
    "9e555"         looks like a number (nummishly 9000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
    "037"           looks like a number (nummishly 37)
    "0xFF"          looks UNLIKE a number (nummishly 255)
    "3 blind mice"  looks UNLIKE a number (nummishly NaN)

So although some aspects improve with bignum, others suffer.  If "0xFF"
looks UNLIKE a number, then why pray tell is 0+it reporting 255?  And why
special treatment for 0xFF but not for 037, which 0+ doesn't convert to 31?
I suppose I can see some thin thread of sense here, but I may have to cross 
my eyes to do so.

Reminds me a bit of this perplexity:

    DB<1> $h{0xff} = 1
    DB<2> $h{040}  = 2
    DB<3> $h{32}   = 3

How many hash keys?  2 or 3?

    DB<4> x \%h
    0  HASH(0x3c3fcd34)
       255 => 1
       32 => 3

Or with this:

    DB<5> $h{time}   = 4
    DB<6> $h{ time } = 5
    DB<7> $h{date}   = 6
    DB<8> $h{ date } = 7

How many more keys, and how many of those are alphas?  

    DB<9> x \%h
    0  HASH(0x3c3fcd34)
       255 => 1
       32 => 3
       'date' => 7
       'time' => 5

I don't mean that I'm particularly perplexed.  But you'd be
astonished at all the folks who guessed this would do something
other than it did at TPC last week: (well, virtually) *everybody*,
including many people you'd've thought would've known better. :-)

Why is it doing this?  

Because to earn "bareword treatment", it must be a legal identifier
starting with an alphabetic or underscore, and only *then* does it get to
be alphanumunders thereafter.  That means date and time are treated as
string literals, that is, "date" and "time" (not as "date" and time(), as
one venerable illustriarch had vainly hoped); whereas the numbers are
treated not as string literals but rather as numeric ones, for they start
with neither an alpha *nor* an underline.

    % perl -Mcharnames=:full -E'say 32 =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    0

Hey, it's better than /^[^\d\W]/!

    % perl -Mcharnames=:full -E'say "0xFF" =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    0
    % perl -Mcharnames=:full -E'say  0xFF =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    0
    % perl -Mcharnames=:full -E'say   "FF" =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    1

    % perl -Mcharnames=:full,latin -E'say "\N{e WITH CEDILLA AND BREVE}" =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    1

So the {040} is *not* {"040"}, and therefore it's {32}; same
with the {0xff} not being {"0xff"}, but {255}.  The => would
do the same, as this demos:

    % perl -Mcharnames=:full -wE'say [time<=time]<-[0] =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    0

Oops, I meant :-)

    % perl -Mcharnames=:full -wE'say [time=>time]->[0] =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    1
    % perl -Mcharnames=:full -wE'say [time=>time]->[1] =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    0

which is quite unlike

    % perl -Mcharnames=:full -E'say [date=>date]->[0] =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    1
    % perl -Mcharnames=:full -E'say [date=>date]->[1] =~ /^ [ \p{ Alpha } \N{LOW LINE} ] /x || 0'
    1

And for your further bemusement:

    % perl -Mcharnames=:full -E'say   _XYZZY_    =~ /^[\pL \N{LOW LINE}]/x || 0'
    1
    % perl -Mcharnames=:full -E'say   _FILE_     =~ /^[\pL \N{LOW LINE}]/x || 0'
    1
    % perl -Mcharnames=:full -E'say   _LINE_     =~ /^[\pL \N{LOW LINE}]/x || 0'
    1
    % perl -Mcharnames=:full -E'say   _DATA_     =~ /^[\pL \N{LOW LINE}]/x || 0'
    1
    % perl -Mcharnames=:full -E'say  __DATA__    =~ /^[\pL \N{LOW LINE}]/x || 0'
:-> %
    % perl -Mcharnames=:full -E'say ___DATA___   =~ /^[\pL \N{LOW LINE}]/x || 0'
    1
    % perl -Mcharnames=:full -E'say  __FILE__    =~ /^[\pL \N{LOW LINE}]/x || 0'
    0
    % perl -Mcharnames=:full -E'say  __LINE__    =~ /^[\pL \N{LOW LINE}]/x || 0'
    0
    % perl -Mcharnames=:full -E'say __PACKAGE__  =~ /^[\pL \N{LOW LINE}]/x || 0'
    1

For a good time(), first do this:

    % alias unperl "perl -MO=Deparse,-q,-x=9,-p"

Then rerun some of those above with unperl instead of with perl.

And have a nice day. :-)

--tom

-- 

    numinous:   Of or pertaining to a numen; divine, spiritual, revealing or
                suggesting the presence of a god; inspiring awe and reverence.

    numismatic: A) Of, pertaining or relating to, coins or coinage
                B) The study of coins and medals, esp. from an
                   archaeological or historical standpoint.

    mummify:    To make into a mummy; to preserve (the bodies of animals) 
                by embalming and drying. Also, to dry into the semblance 
                of a mummy.

    dummify:    A) To undergo a laryngectomy.
                B) To install Windows on a Mac.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About