develooper Front page | perl.perl5.porters | Postings from February 2003

Re: [perl #20912] UTF8 related glitch + fix

Thread Previous | Thread Next
From:
Enache Adrian
Date:
February 14, 2003 14:27
Subject:
Re: [perl #20912] UTF8 related glitch + fix
Message ID:
20030214223740.GA13575@ratsnest.hole
On Thu, Feb 13, 2003 at 05:31:38AM -0000, widyono@cis.upenn.edu (via RT) wrote:
> The following works:
> 
> perl -e '@parse=split(/[, ]+/,"io0, io1"); print "$parse[0]\n$parse[1]\n";'
> 
> The following works in 5.6.1 on linux, but in 5.8.0 (RedHat's 8.0 default
> install RPM built for i386-linux-multi-thread), spits out
> Split loop, <STDIN> line 1.
> 
> perl -e '$input=<STDIN>;chop($input);@parse=split(/[, ]+/, $input);print
> "$parse[0]\n$parse[1]\n";'
> 
> Happens even if $input is reassigned to another var and that var is used
> in split.  Does not happen if [] is not used (what would appropriate
> REGEXP be in that case, without using []?).
> 
> Does not happen with LANG=C.

I can get it on bleadperl too.
The example could be rewritten:

$ perl -le '$p="a,b"; utf8::upgrade $p; split(/[, ]+/,$p)'
Split loop at -e line 1.

( If the locales are utf8, $input=<STDIN> above become
  utf8 'colored' too )

However, this works:

$ perl -le '$p="a,b"; utf8::upgrade $p; print split(/[, ]+/,$p)'
ab

It obviously has to do with the trick pp_split() uses : if the
list it returns has to be assigned to an array ( @_ if 'split'
was called in scalar context) , it uses that array as the stack
(pp.c:4425)

When the string to be split is utf8 flagged, the regexp engine
(at pp.c:4550) may call subs from the utf8 perl module, bracketing
those calls by PUSHSTACKi/POPSTACK pairs.
(utf8.c - Perl_swash_init()/_fetch())

POPSTACK pops there to the PL_curstackinfo->si_stack, not to the
array/stack pp_split() has just switched to. 

The following fixes this bug. Please try.
Regards
Adi

----------------------------------------------------------------------------
--- /arc/perl-current/pp.c	2003-02-02 19:59:19.000000000 +0200
+++ perl-current/pp.c	2003-02-15 00:29:51.000000000 +0200
@@ -4423,6 +4423,7 @@ PP(pp_split)
 	    }
 	    /* temporarily switch stacks */
 	    SWITCHSTACK(PL_curstack, ary);
+	    PL_curstackinfo->si_stack = ary;
 	    make_mortal = 0;
 	}
     }
@@ -4620,6 +4621,7 @@ PP(pp_split)
     if (realarray) {
 	if (!mg) {
 	    SWITCHSTACK(ary, oldstack);
+	    PL_curstackinfo->si_stack = oldstack;
 	    if (SvSMAGICAL(ary)) {
 		PUTBACK;
 		mg_set((SV*)ary);
----------------------------------------------------------------------------
#!/usr/bin/perl
require "test.pl";

eval { $p="a,b"; utf8::upgrade $p; split(/[, ]+/,$p) };
	is ($@, '', '#20912 - split() fails with /[]+/ & utf8');

__END__
----------------------------------------------------------------------------

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About