develooper Front page | perl.perl5.porters | Postings from February 2014

TONYC TPF grant report #27

Thread Next
From:
Tony Cook
Date:
February 17, 2014 12:39
Subject:
TONYC TPF grant report #27
Message ID:
20140217123932.GB8838@mars.tony.develop-help.com
[Hours]         [Activity]
2014/02/10      Monday
 0.87           #120692 benchmark and comment
 0.73           diagnose HP-UX failures
 3.42           try more diagnosis through debugger, work on setting up
                bisect, start bisect, watch blead succeed and try an
                alternative
=====
 5.02

2014/02/11      Tuesday
 0.55           test khw's latest run/locale.t smoke-me
 0.83           some more HP-UX failure testing, discussion
 0.13           #121039 resolve with comment
 0.30           #121203 review and apply
 0.23           #121018 review and start a comment, but rjbs beats me to
                it
 0.77           #121223 review and comment (also watchdog issue discussion
                in IRC)
 0.22           #121220 comment
 1.15           #121207 review, test and comment
 0.85           #121081 produce a test patch
=====
 5.03

2014/02/12      Wednesday
 4.23           HP-UX debugging, found the immediate issue
 0.53           post describing the HP-UX issue
 0.28           #121236 review, research and comment
 1.20           HP-UX problem and #121236 IRC discussion
 0.85           #120939 review new patch and comment
=====
 7.09

2014/02/13      Thursday
 0.62           p5p catch up
 0.43           #121240 comment, review, apply and comment some more
 1.40           #121223 read response, try to work up a supplementary
                patch
 0.23           #121223 comment
 0.37           #121207 retest, minor fix, apply and comment
=====
 3.05

Which I calculate is 20.19 hours.

Approximately 11 tickets were worked on, and 3 patches were
applied.

The most interesting issue this week was a crash in miniperl with
-Duse64bit builds on HP-UX during the build process. See:

http://www.nntp.perl.org/group/perl.perl5.porters/2014/02/msg212392.html

for Tux's initial report.

The initial backtrace wasn't especially enlightening:

#0  0x4000000000148ba8 in S_ssc_and (pRExC_state=0x800003ffefff2470, ssc=0x800003ffefff2a08, 
    and_with=0x800000010004138c) at regcomp.c:12281
#1  0x400000000016dac0 in S_study_chunk (pRExC_state=0x800000010004138c, 
    scanp=0x800003ffefff2948, minlenp=0x800003ffefff23c8, deltap=0x800003ffefff2a00, 
    last=0x80000001000413b8, data=<value optimized out>, stopparen=-1, recursed_depth=0, 
    and_withp=0x0, flags=dwarf2_read_address: Corrupted DWARF expression.
) at regcomp.c:12281

with the line displayed being in a different function from S_ssc_and:

0x4000000000148ba8 in S_ssc_and (pRExC_state=0x800003ffefff2470, ssc=0x800003ffefff2a08, 
    and_with=0x800000010004138c) at regcomp.c:12281
12281       PERL_ARGS_ASSERT_REGPATWS;

To try and track it down, I first found which source line miniperl was
crashing:

(gdb) p CopFILE(PL_curcop)
$1 = 0x800000010004bfb0 "lib/strict.pm"
(gdb) p CopLINE(PL_curcop)
$2 = 6

Line 6 of lib/strict.pm is:

unless ( __FILE__ =~ /(^|[\/\\])\Q${\__PACKAGE__}\E\.pmc?$/ ) {

Given this was in regcomp.c, the regexp was the primary target, so try
a one-liner:

bash-3.1$ ./miniperl -e '/(^|[\/\\])strict\.pmc?$/'
Bus error (core dumped)

eliminated bits until I had the simplest crash and then some more for
one that's a little easier to type:

bash-3.1$ ./miniperl -e '/^|[\/\\]/'
Bus error (core dumped)
bash-3.1$ ./miniperl -e '/^|[ab]/'
Bus error (core dumped)

So what's it failing on, with the last build options I used, the crash
was at 0x40000000001477a4:

0x400000000014774c <S_ssc_and+164>:     ldd 1f0(r1),r25
0x4000000000147750 <S_ssc_and+168>:     ldi 473,r24
0x4000000000147754 <S_ssc_and+172>:     ldo -30(sp),ret1
0x4000000000147758 <S_ssc_and+176>:     b,l 0x4000000000093688 <.stub+72>,rp
0x400000000014775c <S_ssc_and+180>:     copy dp,r4
0x4000000000147760 <S_ssc_and+184>:     copy r4,dp
here:
0x4000000000147764 <S_ssc_and+188>:     ldd 0(r6),r31
0x4000000000147768 <S_ssc_and+192>:     depdi,z -1,31,24,ret0
0x400000000014776c <S_ssc_and+196>:     and r31,ret0,ret0
0x4000000000147770 <S_ssc_and+200>:     addil L%-2000,dp,r1
0x4000000000147774 <S_ssc_and+204>:     ldd 98(r1),r31
0x4000000000147778 <S_ssc_and+208>:     ldd 0(r31),r31
0x400000000014777c <S_ssc_and+212>:     cmpb,*= r31,ret0,0x40000000001477a4 <S_ssc_and+252>

which loads[1] an aligned 64-bit value into r31.

One mistake I made here was I assumed it had to be a 64-bit load at
the C level, which turned out not to be the case, so I was looking for
a pointer or IV/UV load and not finding one.

I did a bisect which identified:

commit 710680787cad21825395c0224606ac1535624c52
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Jan 12 23:39:43 2014 -0700

    Use bit instead of node for regex SSC

    The flag bits in regular expression ANYOF nodes are perennially in short
    supply.  However there are still plenty of regex nodes possible.  So one
    solution to needing to pass more information is to create a node that
    encapsulates what is needed.  That is what commit
    ...

which seemed unlikely to be the cause of the problem, since it simply
changed what had been a check of the regexp node op code/type (an
8-bit value) into a check of the op code and the offset to the next op
(a 16-bit value).

This led me in circles for a while.

Since the optimizer made it difficult to use the debugger, I resorted
to printf() debugging, adding lines to report the current line number,
that isolated the crash to:

    if (is_ANYOF_SYNTHETIC(and_with)) {

in S_ssc_and.

But this is equivalent to:

>      if ((and_with)->type == 21 && (and_with)->next_off == 1)

which only touches 8-bit and 16-bit value, so I went back and looked
at the code around the crash:

0x4000000000147768 <S_ssc_and+192>:     depdi,z -1,31,24,ret0

This essentially stores: 0x00ffffff00000000 into register ret0.

0x400000000014776c <S_ssc_and+196>:     and r31,ret0,ret0

Mask the 64-bit value (if we hadn't crashed).

So I take a look at struct regnode:

  struct regnode {
      U8  flags;
      U8  type;
      U16 next_off;
  };

PA-RISC is big endian, so if read as a 64-bit read this would be:

  0xFFTTNNNNXXXXXXXX

(F for flags, T for type, N for next_off)

so the above mask would isolate type and next_off, and the following
instructions check the value and branch as needed.

So why is the compiler generating code assuming 64-bit alignment?  At
the time, and_with was passed as type struct regnode_ssc *, and struct
regnode_ssc is:

  struct regnode_ssc {
    U8  flags;                          /* ANYOF_POSIXL bit must go here */
    U8  type;
    U16 next_off;
    U32 arg1;                           /* used as ptr in S_regclass */
    char bitmap[ANYOF_BITMAP_SIZE];     /* both compile-time */
    U32 classflags;                     /* and run-time */
    SV* utf8_locale_list;               /* list of code points matched by folds
                                           in a UTF-8 locale */
    SV* invlist;                        /* list of code points matched */
  };

The pointer types means the compiler can assume the struct is 64-bit
aligned, and so it chose to use a 64-bit read to read both type and
next_off in a single read, rather than separate ldb (load byte) and
ldh (load half-word) instructions.

The actual crash was caused by passing nodes from a compiled regexp
into S_ssc_and(), and since those are aligned only on a 32-bit
boundary, not a 64-bit boundary.

[1] http://ftp.parisc-linux.org/docs/arch/parisc2.0.pdf
    PA RISC 2.0 reference manual, this site was down as I wrote this.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About