develooper Front page | perl.perl5.porters | Postings from June 2022

Re: Pre-RFC: New C API for converting from UTF-8 to code point

Thread Previous | Thread Next
From:
Karl Williamson
Date:
June 29, 2022 22:10
Subject:
Re: Pre-RFC: New C API for converting from UTF-8 to code point
Message ID:
ec3bfc82-f4da-2adf-d779-926a0dfa3331@khwilliamson.com
On 6/28/22 20:27, hv@crypt.org wrote:
> Karl Williamson <public@khwilliamson.com> wrote:
> :  Then the loop innards would look like:
> :
> :     while (s < e) {
> :         code_point = next_uvchr(s, e, &retlen);
> :
> :         if (retlen < 0) {
> 
> I'd recommend making this `if (retlen <= 0) {` or otherwise handling
> the retlen == 0 case, even if you only expect that when s >= e: the
> function should be capable of doing something reasonable on an empty
> string, which probably means not croaking.
> 
> :             ... process error ...
> :             retlen = -retlen;
> :         }
> 
> Hugo

It is currently illegal to call the existing functions with zero length 
input.  The functions don't now croak on zero length input, but they do 
assert against it, which on DEBUGGING builds is pretty much the same thing.

IIRC the reason is performance to avoid an essentially useless 
conditional, which otherwise would penalize  everyone.  Since this is XS 
code calling us, there are plenty of other attack vectors available for 
a malicious call.

retlen is guaranteed to never be 0.  Even is s==e, the code would 
attempt to read a byte, and if it didn't segfault, that byte is 
guaranteed to indicate a non-zero length

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About