develooper Front page | perl.perl5.porters | Postings from October 2003

Re: find_encoding("UTF-16BE")->encode("abc") does not NUL-terminate

Thread Previous | Thread Next
From:
Dan Kogai
Date:
October 7, 2003 07:14
Subject:
Re: find_encoding("UTF-16BE")->encode("abc") does not NUL-terminate
Message ID:
820A2425-F8D0-11D7-B584-000393AE4244@dan.co.jp
Gisle,

On Tuesday, Oct 7, 2003, at 22:28 Asia/Tokyo, Gisle Aas wrote:
> I had a bug report on the MIME::Base64 module because it kind of
> depends on the strings passed to its encode() to be NUL-terminated.
> This is not always the case for the strings produced by the Encode
> module.  This program demonstrates:
>
>     #!perl -w
>
>     use Encode qw(encode find_encoding);
>     use Devel::Peek qw(Dump);
>
>     Dump(encode("UTF-16BE", "abc"));
>     Dump(find_encoding("UTF-16BE")->encode("abc"));
>
> With perl-5.8.1 this prints:
>
>     SV = PV(0x819f878) at 0x811f434
>       REFCNT = 1
>       FLAGS = (TEMP,POK,pPOK)
>       PV = 0x8189060 "\0a\0b\0c"\0
>       CUR = 6
>       LEN = 7
>     SV = PV(0x819f878) at 0x811f458
>       REFCNT = 1
>       FLAGS = (TEMP,POK,pPOK)
>       PV = 0x8194fb0 "\0a\0b\0c"
>       CUR = 6
>       LEN = 6
>
> Note that the first form does the right thing while the second does 
> not.

In this particular case I am not sure which side is to blame because 
perl scalar in general does allow the second form (That's what SvCUR() 
is for, IMHO).  The reason why encode() adds null string is that perl 
internally adds "\0" whenever it copies string.

sub encode($$;$)
{
     my ($name, $string, $check) = @_;
     return undef unless defined $string;
     $check ||=0;
     my $enc = find_encoding($name);
     unless(defined $enc){
         require Carp;
         Carp::croak("Unknown encoding '$name'");
     }
     my $octets = $enc->encode($string,$check); # HERE! #
     return undef if ($check && length($string));
     return $octets;
}

Though it is easy to add an extra "\0" for UTF-16 (Done by XS of 
Encode::Unicode), it is equally easy to fix MIME::Base64.

So while I promise to fix this "bug" in Encode::Unicode,  I want to fix 
and tidy other stuff before $Encode::VERSION++.  So if you are 
impatient, I would like you to have your MIME::Base64 take care of this.

After all, null-termination itself is moot w/ UTF-(16|32)(BE|LE)?.

> Regards,

Ditto.

Dan the Encode Maintainer


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About