develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Thread Previous | Thread Next
Nick Ing-Simmons
February 20, 2001 04:55
Re: The State of The Unicode
Message ID:
Simon Cozens <> writes:
>On Mon, Feb 19, 2001 at 09:53:14PM -0500, Andrew Pimlott wrote:
>> Let me say first that the reason all of my pseudo-code has been "OO
>> crap" is that I'm trying to make it as painfully clear as I can
>> think to (yes, emphasis on pain).  Since nobody else has yet
>> proposed any specific interfaces,
>What about the one we've got?
>> I'm saying you call an explicit function, eg to_utf8(), which gives
>> you back a string such that if you say "substr $str, 0, 1", you get
>> the first byte of the UTF-8 representation, and "length $str" is the
>> length of the UTF-8 representation.  Period.
>I wonder if this would work:
>    sub to_utf8 {
>        use bytes
>        return $_[0]
>    }

That only works of $_[0] is already UTF-8 encoded.

use Encode qw(utf8_encode);
sub to_utf8
 my $copy = $_[0];
 utf8_encode($copy);  # upgrade then, turn SvUTF8_off
 return $copy;

Does work. Personally I would rather the functions in Encode returned modified 
value rather than munging-in-place.

The documented interface might be 

use Encode qw(from_to);
sub to_utf8
 my $copy = $_[0];
 my $length = from_to($copy,'Unicode','UTF-8');
 return $copy;

It is unclear what value there is to the _perl_ API in returning the length.

Nick Ing-Simmons <>
Via, but not speaking for: Texas Instruments Ltd.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About