develooper Front page | perl.perl5.porters | Postings from July 2014

Re: [perl #122148] "Malformed UTF-8 character (unexpected end ofstring)" on a tainted string in 5.20

Thread Previous
From:
Dave Mitchell
Date:
July 2, 2014 16:43
Subject:
Re: [perl #122148] "Malformed UTF-8 character (unexpected end ofstring)" on a tainted string in 5.20
Message ID:
20140702164312.GH15476@iabyn.com
On Tue, Jun 24, 2014 at 11:53:02AM +0100, Dave Mitchell wrote:
> On Sat, Jun 21, 2014 at 01:06:42PM -0600, Karl Williamson wrote:
> > On 06/20/2014 07:53 PM, Mark Martinec (via RT) wrote:
> > >Under perl 5.20.0 the following program fails (or warns) on:
> > >
> > >    Malformed UTF-8 character (unexpected end of string)
> > >      in substitution iterator at ./test.pl line 16.
> 
> I can reduce the demo code to the following:
> 
>     $ p -Twe '$_ = "XXXX\x{1000}aaaaaaaaaaaaaaaaaXX" . $^X; s/X/"xxxxxx"/ge'
>     Malformed UTF-8 character (unexpected end of string) in substitution iterator at -e line 1.
>     $
> 
> I haven't looked into it any further yet.

Now fixed with the following. A good candidate for 5.20.1

commit cda67c9995c6d90b71a0939aaae084e1869b8248
Author:     David Mitchell <davem@iabyn.com>
AuthorDate: Wed Jul 2 17:13:45 2014 +0100
Commit:     David Mitchell <davem@iabyn.com>
CommitDate: Wed Jul 2 17:22:52 2014 +0100

    s///e on tainted utf8 strings got pos() messed up
    
    RT #122148: In 5.20, commit 25fdce4a165 changed the way pos() was stored
    in magic attached to SVs from being a byte offset to a char offset,
    *except* that, for efficiency, strings being used for pattern matching
    were kept as byte offsets (with a flag indicating thus), *except* where
    the SV already had magic attached (such as taint, as in the bug report and
    in this commit's test), in which case it kept it as chars.
    
    The code that updated pos() after an iteration of s///e was faulty: the
    string buffer it used for converting byte legnths to char lengths (via
    utf8_length()) was the wrong buffer: rather than using the src string
    being matched against, it was using the destination string being built up
    via iterations of s///. Once double-byte utf8 chars were involved, all the
    pos() calculations went wrong, and utf8 warnings started mysteriously
    appearing.




-- 
No man treats a motor car as foolishly as he treats another human being.
When the car will not go, he does not attribute its annoying behaviour to
sin, he does not say, You are a wicked motorcar, and I shall not give you
any more petrol until you go. He attempts to find out what is wrong and
set it right.
    -- Bertrand Russell,
       Has Religion Made Useful Contributions to Civilization?

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About