develooper Front page | perl.perl5.porters | Postings from February 2017

Re: Is this approach to checking for a program's UTF-8wellformedness ok?

Thread Previous | Thread Next
From:
Karl Williamson
Date:
February 11, 2017 17:09
Subject:
Re: Is this approach to checking for a program's UTF-8wellformedness ok?
Message ID:
91ef54bc-2d5e-651b-2b73-8a73c5164864@khwilliamson.com
On 02/11/2017 12:04 AM, Father Chrysostomos wrote:
> Karl Williamson wrote:
>> 1) It didn't catch string evals.
>> 2) UTF8ness can change in mid chunk, after we've examined the chunk.
>>
>> The first can readily be handled by doing the same check in the eval
>> portion of lex_start.
>
> Well, ideally any scalar passed to a string eval should have been
> validated at some point already.  That is the point I was trying to
> make when I suggested using lex_start or some code in its proximity.
> We should be able to trust our own data structures, without validating
> the same text over and over again.
>
> Am I missing something?
>

It turns out that doing as I have suggested about checking if the 
utf8ness changes in mid chunk, allows me to comment out the check in the 
string eval code without introducing test failures.  So you're not 
missing something in that respect.

However, there is still code in the core that can introduce 
malformations.  There is an open security ticket about that.  And there 
may be other code that we don't know about that can do this.  The core 
was written before people were aware of the perils of doing so, so there 
may be obscure things that can do the same.  So, I think I should leave 
in the string eval check.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About