develooper Front page | perl.perl5.porters | Postings from February 2011

Bug with given() localization, pos() magic, and m//gc state

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
February 20, 2011 13:31
Subject:
Bug with given() localization, pos() magic, and m//gc state
Message ID:
17933.1298237447@chthon
(Tested on 5.13.9.)

I can't figure out whether I've found a bug, or whether I'm being dense.
What happens is that if you do not manually reset the pos state when you
finally hit the end of the string, the next time through you are still at
the end.  Well, of course.  Right.

But this happens when it is not the same variable anymore!  That is, it's 
a new my variable that we're localizing via given().  It seems that the 
pos() magic isn't getting reset when I feel that it should be.  Only when 
you manually reset it do things get better, and no combination of my() 
and local() makes any difference.  Without the manual reset of saying 
undef pos when done (or pos = undef, or pos = 0), you get this behavior:

    Tokenizing string <One room @ $100/night>
      pos $string is undef
      pos $_      is undef
	    @??=Letters      <One>
	    @ 3=Separators   < >
	    @ 4=Letters      <room>
	    @ 8=Separators   < >
	    @ 9=Punctuation  <@>
	    @10=Separators   < >
	    @11=Symbols      <$>
	    @12=Numbers      <100>
	    @15=Punctuation  </>
	    @16=Letters      <night>
	    @21=Done

    Tokenizing string <How now... O Brown Cow?!>
      pos $string is undef
      pos $_      is undef
	    @21=Letters      <w>
	    @22=Punctuation  <?!>
	    @24=Done

    Tokenizing string <Quoth the raven, "Nevermore.">
      pos $string is undef
      pos $_      is undef
	    @24=Letters      <ore>
	    @27=Punctuation  <.">
	    @29=Done

    Tokenizing string <That's all, folks!>
      pos $string is undef
      pos $_      is undef
	    @29=Done

    Tokenizing string <FINIS>
      pos $string is undef
      pos $_      is undef
	    @29=Done

See the way the pos magic gets stuck?  if you uncomment the
"undef pos" line below, all works fine. Am I doing something
stupid, or is this a genuine bug?


Thanks much!

--tom

#!/usr/bin/env perl
#
# forgiven - demo for/given bug in m/\G/c 
#   Tom Christiansen <tchrist@perl.com>
#   Sun Feb 20 14:14:21 MST 2011

use 5.13.0;
use strict;
use autodie;
use warnings qw[ FATAL all ];

END { close STDOUT }
$| = 1;

#################################################################

our @Lines = ( 
    q{One room @ $100/night},
    q{How now... O Brown Cow?!},
    q{Quoth the raven, "Nevermore."},
    q{That's all, folks!},
    q{FINIS},
);

for my $line (@Lines) {
    tokenize($line);
} 

exit;

#################################################################

sub tokenize {
    my $string = shift();
    my $mask = "%-12s <%s>\n";
    printf "Tokenizing string <%s>\n", $string;

    ### These don't help:
    ###     local $_;
    ###     my    $_;

    printf "  pos \$string is %s\n", pos($string) // "undef";
    printf "  pos \$_      is %s\n", pos          // "undef";

TOKEN: for (;;) { 
	 given ($string) {

	    printf "\t\@%2s=", pos // "??";

	    when ((pos || 0) >= length) {
		### XXX: uncomment this next line, and all works; WHY??
		### undef pos;   
		last TOKEN;
	    } 

	    printf $mask, Letters      => $1      when  /\G(\pL+)/gc;
	    printf $mask, Numbers      => $1      when  /\G(\pN+)/gc;
	    printf $mask, Symbols      => $1      when  /\G(\pS+)/gc;
	    printf $mask, Punctuation  => $1      when  /\G(\pP+)/gc;
	    printf $mask, Separators   => $1      when  /\G(\pZ+)/gc;
	    printf $mask, Marks        => $1      when  /\G(\pM+)/gc;
	    printf $mask, Other        => $1      when  /\G(\pO+)/gc;

	    default {
	      die "UNCLASSIFIED: " .
		substr($_, pos || 0, (length > 65) ? 65 : length);
	    }
        }  
    }     

    say "Done\n";
} 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About