develooper Front page | perl.perl5.porters | Postings from March 2008

Re: [perl #52104] Text::Wrap::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on

Thread Previous
From:
Nicholas Clark
Date:
March 26, 2008 14:07
Subject:
Re: [perl #52104] Text::Wrap::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on
Message ID:
20080326210722.GF79799@plum.flirble.org
On Wed, Mar 26, 2008 at 12:50:55PM +0000, Nicholas Clark wrote:
> On Tue, Mar 25, 2008 at 05:08:10PM -0700, Frdric Buclin wrote:
> 
> > As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439, 
> > Text::Wrap::wrap() generates a segfault with Cyrillic characters when 
> > the utf8 flag is turned on. The testcase given in the bug, 
> > https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl 
> > script to run from the shell) shows this very clearly. Due to this bug, 
> > all pages containing such strings are left blank, which is a real 
> > problem for webapps such as Bugzilla.
> 
> The bug seems to be caused by a regexp using pos() inside a substitution, and
> can be reduced to something like this:

> I don't know what the cause is.

It should all be fixed by the appended change, which I expect will be in 5.8.9
soon (RC1 within weeks).

Nicholas Clark

Change 33580 by nicholas@nicholas-saigo on 2008/03/26 21:05:20

	The offset for pos is stored as bytes, and converted to (Unicode)
	character position when read, if needed. The code for setting pos
	inside subst was incorrectly converting to character position before
	storing the value. This code appears to have been buggy since it was
	added in 2000 in change 7562.

Affected files ...

... //depot/perl/pp_ctl.c#688 edit
... //depot/perl/t/op/subst.t#50 edit

Differences ...

==== //depot/perl/pp_ctl.c#688 (text) ====

@@ -298,7 +298,6 @@
     { /* Update the pos() information. */
 	SV * const sv = cx->sb_targ;
 	MAGIC *mg;
-	I32 i;
 	SvUPGRADE(sv, SVt_PVMG);
 	if (!(mg = mg_find(sv, PERL_MAGIC_regex_global))) {
 #ifdef PERL_OLD_COPY_ON_WRITE
@@ -308,10 +307,7 @@
 	    mg = sv_magicext(sv, NULL, PERL_MAGIC_regex_global, &PL_vtbl_mglob,
 			     NULL, 0);
 	}
-	i = m - orig;
-	if (DO_UTF8(sv))
-	    sv_pos_b2u(sv, &i);
-	mg->mg_len = i;
+	mg->mg_len = m - orig;
     }
     if (old != rx)
 	(void)ReREFCNT_inc(rx);

==== //depot/perl/t/op/subst.t#50 (xtext) ====

@@ -7,7 +7,7 @@
 }
 
 require './test.pl';
-plan( tests => 136 );
+plan( tests => 139 );
 
 $x = 'foo';
 $_ = "x";
@@ -583,3 +583,11 @@
     is($want,$_,"RT#17542");
 }
 
+{
+    my @tests = ('ABC', "\xA3\xA4\xA5", "\x{410}\x{411}\x{412}");
+    foreach (@tests) {
+	my $id = ord $_;
+	s/./pos/ge;
+	is($_, "012", "RT#52104: $id");
+    }
+}

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About