develooper Front page | perl.perl5.porters | Postings from February 2015

[perl #123814] grok_atou() for regexp quantifiers

Thread Previous | Thread Next
From:
Hugo van der Sanden
Date:
February 13, 2015 11:43
Subject:
[perl #123814] grok_atou() for regexp quantifiers
Message ID:
rt-4.0.18-29327-1423827805-1853.123814-75-0@perl.org
# New Ticket Created by  Hugo van der Sanden 
# Please include the string:  [perl #123814]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=123814 >


AFL (<http://lcamtuf.coredump.cx/afl>) finds this:

% ./miniperl -e '"z" =~ /x|y{01,}/'
miniperl: regexec.c:6443: S_regmatch: Assertion `st->u.curly.min <= st->u.curly.max' failed.
Aborted (core dumped)
% 

This is a symptom of a bigger problem: regpiece() decides if the regexp fragment about to be parsed is a quantifier by calling regcurly(), which returns TRUE if it matches /{\d+,?\d*\/; if so, it then uses grok_atou to parse the numbers out a) assuming it will always succeed, and b) ignoring any range issues.

In particular, that means if a number matches /^0\d/ it'll return MAX_UV, which is cast to I32 for initial checks (which is how the above test got past the normal min < max check), but then truncated to unsigned 16 bits (though REG_INFTY assumes signed 16 bits):

% ./miniperl -Dr -e 'qr/x{01,}/' 2>&1 | grep CURLY
   1: CURLY {65535,32767} (5)
% 

.. and if a number is valid but out of range it'll be similarly truncated:

% ./miniperl -Dr -e 'qr/x{7777777777}/' 2>&1 | grep CURLY
   1: CURLY {30833,30833} (5)
% 

Fixing this is made harder by the fact that, although there were long-term plans to change it, we currently treat anything nearly but not quite a valid quantifier as literal instead - changing regcurly() to return FALSE for these cases would mean we'd be silently changing the interpretation of existing regexps caught by this to something completely different.

Better would be to raise an error, or at least a warning; but then this becomes the one exception to the "treat as literal if not valid" approach, likely leading to more confusion. Also, regcurly() is called from several places in toke.c and regcomp.c, so it may be tricky to get them to act consistently.

So I'm not sure what to do here.

Hugo


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About