develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

From:
Nick Ing-Simmons
Date:
February 20, 2001 04:55
Subject:
Re: The State of The Unicode
Message ID:
200102201255.MAA23696@mikado.tiuk.ti.com
Simon Cozens <simon@netthink.co.uk> writes:
>On Mon, Feb 19, 2001 at 09:53:14PM -0500, Andrew Pimlott wrote:
>> Let me say first that the reason all of my pseudo-code has been "OO
>> crap" is that I'm trying to make it as painfully clear as I can
>> think to (yes, emphasis on pain).  Since nobody else has yet
>> proposed any specific interfaces,
>
>What about the one we've got?
>
>> I'm saying you call an explicit function, eg to_utf8(), which gives
>> you back a string such that if you say "substr $str, 0, 1", you get
>> the first byte of the UTF-8 representation, and "length $str" is the
>> length of the UTF-8 representation.  Period.
>
>I wonder if this would work:
>
>    sub to_utf8 {
>        use bytes
>        return $_[0]
>    }

That only works of $_[0] is already UTF-8 encoded.

use Encode qw(utf8_encode);
sub to_utf8
{
 my $copy = $_[0];
 utf8_encode($copy);  # upgrade then, turn SvUTF8_off
 return $copy;
}



Does work. Personally I would rather the functions in Encode returned modified 
value rather than munging-in-place.

The documented interface might be 

use Encode qw(from_to);
sub to_utf8
{
 my $copy = $_[0];
 my $length = from_to($copy,'Unicode','UTF-8');
 return $copy;
}

It is unclear what value there is to the _perl_ API in returning the length.

-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About