develooper Front page | perl.perl5.porters | Postings from October 2009

Re: [perl #69973] Invalid and tainted utf-8 char crashes perl 5.10.1in regexp evaluation

Thread Previous | Thread Next
From:
Vincent Pit
Date:
October 23, 2009 13:53
Subject:
Re: [perl #69973] Invalid and tainted utf-8 char crashes perl 5.10.1in regexp evaluation
Message ID:
4AE217DD.1060502@profvince.com

> 2009/10/22 Mark Martinec <perlbug-followup@perl.org>:
>   
>> # New Ticket Created by  Mark Martinec
>> # Please include the string:  [perl #69973]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=69973 >
>>
>>
>>
>> This is a bug report for perl from Mark.Martinec@ijs.si,
>> generated with the help of perlbug 1.39 running under perl 5.10.1.
>>
>>
>> -----------------------------------------------------------------
>> [Please describe your issue here]
>>
>> Tracking down a reason for crashes of a perl process while processing
>> certain obfuscated spam messages, it turns out that an utf-8 character
>> with a large (and invalid) codepoint is causing a perl 5.10.1 crash
>> while matching such string to a particular regular expression.
>>
>> This is happening on a FreeBSD 7.2, using perl as installed from ports
>> with no special settings.
>>
>> Reducing the actual crashing application to a small test case,
>> here it is:
>>
>>
>> #!/usr/bin/perl -T
>>  use strict;
>>
>>  # Here is a HTML snippet from a malicious/obfuscated mail message.
>>  # Note the last character has an invalid and huge UTF-8 code
>>  # (as a result of an unrelated bug in HTML::Parser).
>>  #
>>  my $t = '<a>Attention Home&#959&#969n&#1257rs...1&#1109t '.
>>          'T&#1110&#1084e E&#957&#1257&#1075075</a>';
>>
>>  $t =~ s/&#(\d+)/chr($1)/ge;    # convert HTML entities to UTF8
>>  $t .= substr($ENV{PATH},0,0);  # make it tainted
>>
>>  # show character codes in the resulting string
>>  print join(", ", map {ord} split(//,$t)), "\n";
>>
>>  # The following regexp evaluation crashes perl 5.10.1 on FreeBSD.
>>  # Note that $t must be tainted and must have the UTF8 flag on,
>>  # otherwise the crash seems to be avoided.
>>
>>  $t =~ /( |\b)(http:|www\.)/i;
>>
>>     

Bisected down to 8902bb05b18c9858efa90229ca1ee42b17277554
(http://perl5.git.perl.org/perl.git/commit/8902bb05):

Author: Slaven Rezic <slaven@rezic.de>
Date:   Sun Jan 4 17:28:33 2009 +0100

    Another regexp failure with utf8-flagged string and byte-flagged
pattern (reminder)
   
    Date: 17 Nov 2007 16:29:29 +0100
    Message-ID: <87r6iohova.fsf@biokovo-amd64.herceg.de>
   
    (cherry picked from commit c012444fd89eef64e1d1687642cdb9f968e96739)



Vincent.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About