develooper Front page | perl.perl5.porters | Postings from December 2002

Re: [perl #18915] array assignment works correctly only in debugger

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
December 7, 2002 10:50
Subject:
Re: [perl #18915] array assignment works correctly only in debugger
Message ID:
20021207184658.GF283@Bagpuss.unfortu.net
On Fri, Dec 06, 2002 at 03:18:40PM -0000, Vilem Marsik wrote:

> A program behaves correctly, if I set in debugger a breakpoit to line 10 or 11 (or step through every line), and incorrectly, if the breakpoint is at
> line 12 or later, or no breakpoint, or run without debugger.
> I have this program :
> 
> #!/usr/bin/perl -w
> 
> $filename = $ARGV[0] or die "Parameter (DTD file) missing";
> 
> open FILE,$filename or die "Cannot open $filename";
> while ($row=<FILE>)
> {
>         if($row =~ /<!ELEMENT\s+(\w+)\s+(\([^\)]+\)|EMPTY)/)
>         {
>                 my @d;
>                 @d = map {$_ ? ($_) : ()} ($2 eq "EMPTY" ? () : split /[|,+*+()\s]+/,$2);
>                 printf "%s\n",join ',',@d;
>                 exit;
>         }
> }
> 
> And a datafile containing single line :
> 
> <!ELEMENT benchmark (readin,database,readout)>

> Environment for perl v5.8.0:
>     HOME=/home/vm
>     LANG=en_US.UTF-8
>     LANGUAGE (unset)
>     LD_LIBRARY_PATH=:/home/vm/sqllib/lib
>     LOGDIR (unset)

It's a utf8 regexp bug - I think something to do with the swash code.
It's still unsolved in the development version (tested with 18251)

Your program should work if you unset $LANG (or change it to something that
isn't UTF8) - not a fix, but hopefully an acceptable workaround.

The smallest case I can get it down to is:

#!/usr/bin/perl -w

for my $a (0,1) {
  $_ = 'readin,database,readout';
  if ($ARGV[0])  {
    $_ .= chr 256;
    chop;
  }
  /(.+)/;

  my @d = split /[,]/,$1;
  print join (':',@d), "\n";
}
__END__

Without any arguments (not utf8) I see:

readin:database:readout
readin:database:readout

with $ARGV[1] true, I see:

#adin:database:readout
readin:database:readout

so it's something caused by the regexp compiler meeting a character range in
the regexp for split, realising that it needs to load UTF8 data, and whatever
it calls to perform that load is triggering corruption of $2

I suspect that when you run it under the debugger, the regexp compiler has
already met a regexp that has caused it to load whatever tables it needed, so
there is no data load taking place during the regexp in the split, and so
no corruption.


Thanks for the report - I'm not sure when we'll have a fix for this perl
bug.

Nicholas Clark
-- 
INTERCAL better than perl?	http://www.perl.org/advocacy/spoofathon/

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About