develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Thread Previous | Thread Next
From:
Nick Ing-Simmons
Date:
February 20, 2001 05:21
Subject:
Re: The State of The Unicode
Message ID:
200102201321.NAA23740@mikado.tiuk.ti.com
Simon Cozens <simon@netthink.co.uk> writes:
>OK, the subroutine I gave wasn't correct (I only said I wondered if it would
>work) but the principle is there: if you feed it UTF8 encoded data, you get a
>byte string back. If you feed it non-UTF8-encoded data, you get a byte string
>back. Which is What You Meant. How nice.
>
>> that you haven't thought through.
>
>This isn't going to endear you, you realise? I've thought this through
>enough to produce the code to get the current Unicode model working, and
>working very well. How much have *you* thought it through?

Lets not get grumpy either way you guys. Andrew has thought through
far enough to realise 'use bytes' does not help.

But the model works fine, and almost everything one needs is there.
There are rough spots - the API of Encode::* functions is not ideal 
for perl use, and legacy code that expects to 

sub my_encrypt
{                 
 my @list = unpack('C*',$_[0]);
 foreach my $byte (@list)
  {
   printf "$byte\n";
  }  
}
  
my $str = "AB£".chr(256);
chop($str);

my_encrypt("AB£");
my_encrypt($str);

Gets a surprising extra 194 because unpack('C', does not downgrade,
but we are very close.


-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About