develooper Front page | perl.perl5.porters | Postings from July 2018

Re: [perl #133347] BBC SREZIC/Tk-804.034.tar.gz

Thread Previous | Thread Next
From:
Karl Williamson
Date:
July 11, 2018 01:55
Subject:
Re: [perl #133347] BBC SREZIC/Tk-804.034.tar.gz
Message ID:
27898_1531274124_5B45638C_27898_196_1_52570752-7e0b-5f2c-9058-039c4d1ed57e@khwilliamson.com
On 07/10/2018 04:14 PM, slaven@rezic.de via RT wrote:
> Dana Tue, 10 Jul 2018 11:03:07 -0700, public@khwilliamson.com reče:
>> On 07/07/2018 10:37 PM, Karl Williamson wrote:
>>> On 07/07/2018 12:05 PM, (Andreas J. Koenig) (via RT) wrote:
>>>> # New Ticket Created by  (Andreas J. Koenig)
>>>> # Please include the string:  [perl #133347]
>>>> # in the subject line of all future correspondence about this issue.
>>>> # <URL: https://rt.perl.org/Ticket/Display.html?id=133347 >
>>>>
>>>>
>>>> Properly tested, perl -V output in the link, a potential BBC
>>>> candidate:
>>>>   http://www.cpantesters.org/cpan/report/60a41a6a-7d91-11e8-8819-
>>>> 8424e7b38300
>>>>
>>>>
>>>> Thanks,
>>>>
>>>
>>> This is using long-deprecated functions that are security holes, and
>>> which have finally been removed in 5.29.
>>>
>>
>> There is a comment in the code that says
>>
>> /* Doing these in-place seems risky ... */
>>
>>
>> And in inspecting the code for working with Unicode, I realized the
>> API
>> can't possibly work generally as currently specified.
>>
>> There are functions to uppercase and lowercase a C string encoded in
>> UTF-8.  These functions work on the assumption that each character in
>> the case-changed result occupies the same number of bytes as the
>> original.  That simply isn't the case.  Perhaps attackers are already
>> using this.
> 
> Maybe you can try it out? In real-world code, the to_utf8_lower warning appears in text boxes when opening the search dialog (right mouse > Find), choose the settings "exact" and "nocase", and do a search. So if you know some unicode which causes lengthening of the utf8 string when turning into lowercase, then it's possible to exceed the buffer.

I need a working version to try it out.

But if it were just lowercase, the problem would be manageable, as there 
are only three code points that currently expand:  U+0130 (dotted 
capital I) used in Turkic languages, and two code points used in 
Sencoten, a west-coast Canadian native language.  If it is to accept 
things beyond Latin1, it would need to be rewritten so that it doesn't 
assume that every character's changed case has the same number of bytes 
as the original.  Some characters contract.

The problem is the uppercasing.  There are quite a few characters that 
expand when uppercased.  Under what circumstances does this get called?
> 
> For creating a text box, it's enough to write:
> 
>      perl5.28.0 -MTk -e 'tkinit->Text->pack; MainLoop'
> 
>>
>> I don't know how to proceed.  The API would have to be redesigned.  I
>> don't know if these functions are called from outside or not.  What
>> hits
>> me first is to have the functions take a parameter like , *&esult".
>> Upon return, the function would malloc enough memory to hold the
>> changed
>> result, and store its address into *result.  It becomes the caller's
>> responsibility to free it after use.
> 
> Maybe it's worth to see what the original Tcl code does:
> https://www.tcl.tk/man/tcl/TclLib/ToUpper.htm
> 
> It seems that
> - it's guaranteed here that the utf8 string is never made longer (in terms of bytes)
> - it handles only characters in the latin1 range, which are probably unlikely to cause problems (even a upper conversion ß -> SS would not change the utf8 bytes length)
If we're willing to limit the inputs to Latin1, this is true, and there 
is no problem, but there needs to be a check for that.  The uppercase 
for two Latin1 characters is above Latin1, but still occupy 2 bytes.
> 
> Unfortunately the new to*_utf8_safe functions don't say what's happening in case of exceeding the available space. Maybe it could be done similar as in Encode.pm and the user has some (limited) possibilities to specify the fallback behavior (FB_CROAK + the possibility to set the replacement character). If this possibility exists, then a quick fix would be to just use the replacement character fallback.

They currently die, which seems too severe now that you bring it up, but 
is in keeping with various other parts of the core.  This does seem like 
a worthwhile idea you have.  Note that there might not be room for a 
replacement character, though.

> 
> What's also interesting --- there's no Perl warning if the Tk::Text find functionality is set to regexp+nocase. So maybe a simple fix would be to convert the search string from exact to an equivalent regexp (e.g. using something like '^'.quotemeta($string).'$').
> 
> 
> ---
> via perlbug:  queue: perl5 status: open
> https://rt.perl.org/Ticket/Display.html?id=133347
> 

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About