develooper Front page | perl.perl5.porters | Postings from October 2003

Re: find_encoding("UTF-16BE")->encode("abc") does not NUL-terminate

Thread Previous | Thread Next
From:
Gisle Aas
Date:
October 7, 2003 07:38
Subject:
Re: find_encoding("UTF-16BE")->encode("abc") does not NUL-terminate
Message ID:
lroewt5d1r.fsf@caliper.activestate.com
Dan Kogai <dankogai@dan.co.jp> writes:

> Gisle,
> 
> On Tuesday, Oct 7, 2003, at 22:28 Asia/Tokyo, Gisle Aas wrote:
> > I had a bug report on the MIME::Base64 module because it kind of
> > depends on the strings passed to its encode() to be NUL-terminated.
> > This is not always the case for the strings produced by the Encode
> > module.  This program demonstrates:
> >
> >     #!perl -w
> >
> >     use Encode qw(encode find_encoding);
> >     use Devel::Peek qw(Dump);
> >
> >     Dump(encode("UTF-16BE", "abc"));
> >     Dump(find_encoding("UTF-16BE")->encode("abc"));
> >
> > With perl-5.8.1 this prints:
> >
> >     SV = PV(0x819f878) at 0x811f434
> >       REFCNT = 1
> >       FLAGS = (TEMP,POK,pPOK)
> >       PV = 0x8189060 "\0a\0b\0c"\0
> >       CUR = 6
> >       LEN = 7
> >     SV = PV(0x819f878) at 0x811f458
> >       REFCNT = 1
> >       FLAGS = (TEMP,POK,pPOK)
> >       PV = 0x8194fb0 "\0a\0b\0c"
> >       CUR = 6
> >       LEN = 6
> >
> > Note that the first form does the right thing while the second does
> > not.
> 
> In this particular case I am not sure which side is to blame because
> perl scalar in general does allow the second form (That's what SvCUR()
> is for, IMHO).

The following invariant should always hold if SvPOK(sv):

   - SvCUR(sv) < SvLEN(sv)
   - *SvEND(sv) == '\0'

The perl core ensures that and so should extensions.  The perlguts
manpage says:

       All SVs that contain strings should be terminated with a
       NUL character.  If it is not NUL-terminated there is a
       risk of core dumps and corruptions from code which passes
       the string to C functions or system calls which expect a
       NUL-terminated string.  Perl's own functions typically add
       a trailing NUL for this reason.  Nevertheless, you should
       be very careful when you pass a string stored in an SV to
       a C function or system call.

>  The reason why encode() adds null string is that perl
> internally adds "\0" whenever it copies string.

Ok, that explains the difference.

> Though it is easy to add an extra "\0" for UTF-16 (Done by XS of
> Encode::Unicode), it is equally easy to fix MIME::Base64.

Yes, perhaps we should fix both.

> So while I promise to fix this "bug" in Encode::Unicode,  I want to
> fix and tidy other stuff before $Encode::VERSION++.  So if you are
> impatient, I would like you to have your MIME::Base64 take care of
> this.

Sure.

> After all, null-termination itself is moot w/ UTF-(16|32)(BE|LE)?.

True, but when they are stuffed in perl strings they should still be
made safe for external APIs that expect NUL-termination.

> Dan the Encode Maintainer

Gisle the MIME::Base64 Maintainer :)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About