Front page | perl.perl5.porters |
Postings from February 2012
Re: Broken API for is_utf8_char()
Thread Previous
|
Thread Next
From:
Karl Williamson
Date:
February 3, 2012 13:20
Subject:
Re: Broken API for is_utf8_char()
Message ID:
4F2C4F79.6010502@khwilliamson.com
On 11/21/2011 12:04 AM, Jarkko Hietaniemi wrote:
> While agreeing that the is_utf8_char() API is horribly insecure (well,
> as insecure as, say, strlen() when applied to unknown data) (and FWIW,
> I might be the perpetrator), I'd say that depending on its uses it
> might make more to split in few new APIs. Namely, if we do a lot
>
> is_utf8_char(p, MAX_CONSTANT)
>
> that is inherently no better than an unbounded scan, on untrusted data.
>
> If we do a lot
>
> is_utf8_char(p, n)
>
> where n is the correct number of UTF-8 bytes, hang on, how did we know
> it anyway? The only case I can think a generic <ptr, len> approach
> seems reasonable is
>
> is_utf8_char(p, end - p)
>
> in which case we often already know something more about this buffer
> ending in end. Like if the p is actually SvPVX(sv), in which case
> the full API with the raw pointer and length is daft.
>
I put this on the back burner, for a while, but have thought about it
some more. What I now propose to do is to take Nicholas' advice and
change this function to deprecated in embed.fnc, and to create a new
function, say.
is_utf8_char_len(p, bufsize)
The API may be daft, and that may explain why no one uses the earlier
one. is_utf8_string(s, len) works for a single character as well as a
whole string, and is typically what people want.
But for the case where someone knows what the buffer size is, the
alternative function is trivial to provide.
There is one call to is_utf8_char() in the core. It is used by the
common routine to all of things like is_utf8_alpha(), etc. Rather than
change the APIs for all these, I'm proposing that we say that these
should be called only on already-validated data. We could then delete
the call, or change it to the new function as
is_utf8_char(p, UTF8SIZE(p))
to get all the validation that doesn't involve buffer overflow
Thread Previous
|
Thread Next