Dan Kogai <dankogai@dan.co.jp> writes: > Gisle, > > On Tuesday, Oct 7, 2003, at 22:28 Asia/Tokyo, Gisle Aas wrote: > > I had a bug report on the MIME::Base64 module because it kind of > > depends on the strings passed to its encode() to be NUL-terminated. > > This is not always the case for the strings produced by the Encode > > module. This program demonstrates: > > > > #!perl -w > > > > use Encode qw(encode find_encoding); > > use Devel::Peek qw(Dump); > > > > Dump(encode("UTF-16BE", "abc")); > > Dump(find_encoding("UTF-16BE")->encode("abc")); > > > > With perl-5.8.1 this prints: > > > > SV = PV(0x819f878) at 0x811f434 > > REFCNT = 1 > > FLAGS = (TEMP,POK,pPOK) > > PV = 0x8189060 "\0a\0b\0c"\0 > > CUR = 6 > > LEN = 7 > > SV = PV(0x819f878) at 0x811f458 > > REFCNT = 1 > > FLAGS = (TEMP,POK,pPOK) > > PV = 0x8194fb0 "\0a\0b\0c" > > CUR = 6 > > LEN = 6 > > > > Note that the first form does the right thing while the second does > > not. > > In this particular case I am not sure which side is to blame because > perl scalar in general does allow the second form (That's what SvCUR() > is for, IMHO). The following invariant should always hold if SvPOK(sv): - SvCUR(sv) < SvLEN(sv) - *SvEND(sv) == '\0' The perl core ensures that and so should extensions. The perlguts manpage says: All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless, you should be very careful when you pass a string stored in an SV to a C function or system call. > The reason why encode() adds null string is that perl > internally adds "\0" whenever it copies string. Ok, that explains the difference. > Though it is easy to add an extra "\0" for UTF-16 (Done by XS of > Encode::Unicode), it is equally easy to fix MIME::Base64. Yes, perhaps we should fix both. > So while I promise to fix this "bug" in Encode::Unicode, I want to > fix and tidy other stuff before $Encode::VERSION++. So if you are > impatient, I would like you to have your MIME::Base64 take care of > this. Sure. > After all, null-termination itself is moot w/ UTF-(16|32)(BE|LE)?. True, but when they are stuffed in perl strings they should still be made safe for external APIs that expect NUL-termination. > Dan the Encode Maintainer Gisle the MIME::Base64 Maintainer :)Thread Previous | Thread Next