develooper Front page | perl.perl5.porters | Postings from August 2012

[perl #114410] Substr giving wrong results on $1 with utf8

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
August 31, 2012 08:29
Subject:
[perl #114410] Substr giving wrong results on $1 with utf8
Message ID:
rt-3.6.HEAD-11172-1346426987-1236.114410-15-0@perl.org
On Fri Aug 31 02:28:08 2012, nicholas wrote:
> On Thu, Aug 30, 2012 at 06:20:13PM -0700, Father Chrysostomos via RT
> wrote:
> > I've fixed this in commit 7d1328bb7c by reset utf8 caches in mg_get
> 
> This seems like a sensible solution. (I can't spot any flaws in the
> approach)
> 
> 
> Historically, if something breaks because of tie, it usually also
> breaks
> with overloading:
> 
> $ cat 114410.pl
> #!/perl -w
> use strict;
> 
> package UTF8Toggle;
> use strict;
> 
> use overload '""' => 'stringify', fallback => 1;
> 
> sub new {
>     my $class = shift;
>     my $value = shift;
>     my $state = shift||0;
>     return bless [$value, $state], $class;
> }
> 
> sub stringify {
>     my $self = shift;
>     $self->[1] = ! $self->[1];
>     if ($self->[1]) {
> 	utf8::downgrade($self->[0]);
>     } else {
> 	utf8::upgrade($self->[0]);
>     }
>     $self->[0];
> }
> 
> package main;
> 
> my $u = UTF8Toggle->new(" \x{c2}7 ");
> 
> printf "%d\n", ord substr $u, 1;
> printf "%d\n", ord substr $u, 1;
> 
> __END__
> $ ./perl -Ilib 114410.pl
> 194
> panic: sv_pos_u2b_cache cache 5 real 4 for  �7  at 114410.pl line 32.
> 
> 
> I'm not sure what the best fix is here. Given that I'd been wondering
> if the
> fix for #114410 was to outlaw caching of tied values, but simply
> expiring
> the cache on the next read works, is the right fix here to trap all
> points
> that call into overload value returning routines and reset the cache?

As you may have noticed (if not: git log d8f2f09061), I went searching
for instances of sv_len_utf8 that were incorrect.

In the process of doing so, I noticed this in a few places (this one in
pp_sys.c:pp_syswrite):

		if (SvGMAGICAL(bufsv) || SvAMAGIC(bufsv)) {
		    /* Don't call sv_len_utf8 again because it will call magic
		       or overloading a second time, and we might get back a
		       different result.  */
		    blen_chars = utf8_length((U8*)buffer, (U8*)buffer + blen);
		} else {
		    /* It's safe, and it may well be cached.  */
		    blen_chars = sv_len_utf8(bufsv);
		}

And I slept on it and came to the conclusion that overload couldn’t work
correctly despite my changes (which you have now demonstrated).

I’m also wondering whether it’s even worth creating the utf8 cache to
begin with on magical values, as it will be invalidated almost
immediately.  It seems that the extra work to facilitate an optimisation
actually slows things down.

In that case, my sv_len_utf8_nomg should go, and we should have a macro
that does what syswrite already does.

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: resolved
https://rt.perl.org:443/rt3/Ticket/Display.html?id=114410

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About