develooper Front page | perl.perl5.porters | Postings from February 2001

Encode - cleanup and issues

Thread Next
February 24, 2001 09:46
Encode - cleanup and issues
Message ID:

Attached is the pod from //depot/perlio/ext/Encode/...@8923

I made changes in the branch so Jarkko can decide if this is 
what we want before merging.
I think I have implemented what is documented, and documented what 
isn't implmented yet. I may have missed things.

(Change 8923 also touches one or two other spots to make internals
more useful to Encode.)

The document is close to Karsten Sperling's perl strings document
in most respects (Encode is only part of that discussion).

I would appreciate comments on the document, and the following open issues:

1. The coderef as CHECK - we need to agree on what arguments it gets
   and how it returns its results. (My thoughts are in the pod.)

2. "Wide char encodings" when encoding into 16-bit encodings
   e.g. jis208, gb2312 or UCS-2 then currently the output is an octet sequence
   in ('cos that what X11 wants - even on x86) big-endian order.

   But I can imagine wanting to process that sequence by logical number
   in the encoded space. (This is uninteresting for UCS-2 as it is what 
   we had to start with - might be interesting for UTF-16 for surrogate

   We could do that by (INTERNALS stuff) UTF-8 encoding the resulting 
   16-bit value and turning on the SvUTF8_on.

   One could then use ord() on the octet sequence to pick out values.
   Is this interesting? 
   Is it better than unpack('S',...)? 

Nick Ing-Simmons

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About