develooper Front page | perl.perl5.porters | Postings from December 2018

regexec.c: LGTM analysis warns about two comparisons, but weactually need them

Thread Next
From:
James E Keenan
Date:
December 6, 2018 03:29
Subject:
regexec.c: LGTM analysis warns about two comparisons, but weactually need them
Message ID:
20181206032854.21534.qmail@lists-nntp.develooper.com
Here is a case where the LGTM.com analysis of the Perl 5 source code 
produces results that are at first plausible but ultimately incorrect.

The following section of regexec.c is cited 
(https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804) 
with the warning-level alert "Comparison result is always the same":

#####
  9732 STATIC bool
  9733 S_reginclass(pTHX_ regexp * const prog, const regnode * const n, 
const U8* const p, const U8* const p_end      , const bool utf8_target)
  9734 {
  9735     dVAR;
  9736     const char flags = ANYOF_FLAGS(n);
  9737     bool match = FALSE;
  9738     UV c = *p;
...
  9762     /* If this character is potentially in the bitmap, check it */
  9763     if (c < NUM_ANYOF_CODE_POINTS) {
  9764     if (ANYOF_BITMAP_TEST(n, c))
  9765         match = TRUE;
  9766     else if ((flags
  9767                 & 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER)
  9768                   && OP(n) == ANYOFD
  9769           && ! utf8_target
  9770           && ! isASCII(c))
  9771     {
  9772         match = TRUE;
  9773     }
  9774     else if (flags & ANYOF_LOCALE_FLAGS) {
  9775         if ((flags & ANYOFL_FOLD)
  9776                 && c < 256
  9777         && ANYOF_BITMAP_TEST(n, PL_fold_locale[c]))
  9778             {
  9779                 match = TRUE;
  9780             }
  9781             else if (ANYOF_POSIXL_TEST_ANY_SET(n)
  9782                      && c < 256
  9783             ) {
  9784
...
  9827                 }
  9828         }
  9829     }
  9830     }
...
#####

At lines 9776 and 9782, the 'c < 256' statements are flagged as 
problematic because "Comparison is always true because c <= 255."

By inserting debugging statements (as suggested by xenu++) and then 
running the test suite, I confirmed that, at least within the test suite 
the value of 'c' ranges between 0 and 255.

I then created a branch (jkeenan/lgtm-regexec-c; 
https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652) 
in which I simply removed those two comparisons.  Notwithstanding the 
alleged superfluousness of the two comparisons, I got these failures:

#####
Test Summary Report
-------------------
re/regexp_noamp.t                                                (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp.t                                                      (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_nonull.t                                               (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr.t                                                   (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_notrie.t                                               (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr_embed.t                                             (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_trielist.t                                             (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr_embed_thr.t                                         (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
Files=2630, Tests=1181079, 271 wallclock secs (77.61 usr 10.15 sys + 
555.09 cusr 52.55 csys = 695.39 CPU)
Result: FAIL
*** Error code 32
#####

Each of the failing test files uses fixture data kept in t/re/re_tests:

#####
    8 # pat|string|y/n/etc|expr|expected-expr|skip-reason|comment$
...
1829 # These mostly exercize various paths in the optimizer$
1830 /(?l:a?\w)/|b|y|$&|b$
...
1838 /\s?\s/l|\t|y|$&|\t$
1839 /\s?\d/l|3|y|$&|3$
#####

(In regexec.c, what I have represented as pipe-characters above are 
actually hard-tabs.)

So we cannot remove the two comparisons and must somehow inform LGTM 
that these warnings are spurious.

Can anyone explain why these unit tests should preclude removal of the 
two comparisons?

Thank you very much.
Jim Keenan

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About