develooper Front page | perl.perl5.porters | Postings from December 2004

Overlong UTF-8 (Re: Make Encode.pm support the real UTF-8)

Thread Next
From:
Gisle Aas
Date:
December 3, 2004 03:36
Subject:
Overlong UTF-8 (Re: Make Encode.pm support the real UTF-8)
Message ID:
lr4qj3fwaw.fsf_-_@caliper.activestate.com
Tim Bunce <Tim.Bunce@pobox.com> writes:

> On Wed, Dec 01, 2004 at 01:28:05PM -0800, Gisle Aas wrote:
> > As you probably know perl's version of UTF-8 is not the real thing.  I
> > thought I would hack up a patch to support the encoding as defined by
> > Unicode.  That involves rejecting illegal chars (like surrogates,
> > "\x{FFFF}" and "\x{FDD0}), chars above 0x10FFFF, overlong sequences
> > and such.
> 
> It's worth remembering that overlong sequences are a potential security risk.

The current Encode utf8 decoder already refuse these as this is one of
the things that perl's internal is_utf8_char() actually check for.

The current encoder does not check anything so it might emit overlong
sequences.

The ':utf8' layer does not check its input and is happy to accept
overlong sequences.  It just slaps on the SvUTF8 flag.  Using
':encoding(utf8)' layer instead will reject these since this will
invoke Encode.

While checking this out I found that Data::Dumper will actually
segfault when given overlong UTF-8.  There are probably other issues
like this to be found if you start looking.

bash-2.05b$ perl -v

This is perl, v5.8.6 built for i686-linux

Copyright 1987-2004, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

bash-2.05b$ cat xxx.pl
if (@ARGV) {
    print "Hi\n";
    if ($ARGV[0] eq "encoding") {
        binmode(STDIN, ':encoding(utf8)');
    }
    elsif ($ARGV[0] eq "utf8") {
        binmode(STDIN, ':utf8');
    }

    my $data = <STDIN>;

    use Data::Dumper;
    print Dumper($data);
}
else {
    print "foo\xf0\x80\x80\x80bar\n";
}
bash-2.05b$ perl xxx.pl | perl xxx.pl raw
Hi
$VAR1 = 'fooĆ°bar
';
bash-2.05b$ perl xxx.pl | perl xxx.pl encoding
Hi
utf8 "\xF0" does not map to Unicode at xxx.pl line 10.
utf8 "\xF0" does not map to Unicode at xxx.pl line 10.
$VAR1 = 'foo';
bash-2.05b$ perl xxx.pl | perl xxx.pl utf8
Hi
Segmentation fault


Valgrind says:

==17318==
==17318== Invalid write of size 1
==17318==    at 0x1B90A97B: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90CDB0: DD_dump (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90DFC5: XS_Data__Dumper_Dumpxs (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x80B4FE0: Perl_pp_entersub (in /opt/perl/5.8.6/bin/perl)
==17318==  Address 0x1BC1684B is 0 bytes after a block of size 11 alloc'd
==17318==    at 0x1B9059FF: realloc (vg_replace_malloc.c:197)
==17318==    by 0x809FB2A: Perl_safesysrealloc (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x80B7664: Perl_sv_grow (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x1B90A952: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==
==17318== Invalid write of size 1
==17318==    at 0x1B90A984: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90CDB0: DD_dump (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90DFC5: XS_Data__Dumper_Dumpxs (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x80B4FE0: Perl_pp_entersub (in /opt/perl/5.8.6/bin/perl)
==17318==  Address 0x1BC1684C is 1 bytes after a block of size 11 alloc'd
==17318==    at 0x1B9059FF: realloc (vg_replace_malloc.c:197)
==17318==    by 0x809FB2A: Perl_safesysrealloc (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x80B7664: Perl_sv_grow (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x1B90A952: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==
==17318== Invalid write of size 1
==17318==    at 0x1B90A98A: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90CDB0: DD_dump (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x1B90DFC5: XS_Data__Dumper_Dumpxs (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==    by 0x80B4FE0: Perl_pp_entersub (in /opt/perl/5.8.6/bin/perl)
==17318==  Address 0x1BC1684D is 2 bytes after a block of size 11 alloc'd
==17318==    at 0x1B9059FF: realloc (vg_replace_malloc.c:197)
==17318==    by 0x809FB2A: Perl_safesysrealloc (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x80B7664: Perl_sv_grow (in /opt/perl/5.8.6/bin/perl)
==17318==    by 0x1B90A952: esc_q_utf8 (in /opt/perl/5.8.6/lib/5.8.6/i686-linux/auto/Data/Dumper/Dumper.so)
==17318==

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About