develooper Front page | perl.perl5.porters | Postings from January 2012

Re: [rt.cpan.org #73623] [perl #107326] perl's unicode conversionfails when iconv succeeds

Thread Previous | Thread Next
From:
Eric Brine
Date:
January 4, 2012 14:09
Subject:
Re: [rt.cpan.org #73623] [perl #107326] perl's unicode conversionfails when iconv succeeds
Message ID:
CALJW-qFU3pu+wJNq5_vu=2X4uNS27Ct9Xopq-Gkp1oOt22_Ppw@mail.gmail.com
On Wed, Jan 4, 2012 at 4:34 PM, Linda Walsh <perl-diddler@tlinx.org> wrote:

> **
>
>
> Eric Brine wrote:
>
> On Tue, Jan 3, 2012 at 6:09 PM, Linda Walsh <perl-diddler@tlinx.org>wrote:
>
>>  If you are Networking, you used network byte order.� If you are doing
>> processing
>>  on the same machine, you use native byte order.
>>
>> To do otherwise is to incur horrible inefficiencies.
>>
>
> Reading UTF-16le:
>
> UV c;
> c = *(p++);
> c |= *(p++) << 8;
>
> ----
>
> Wouldn't your target be a buffer pointer?
>

No. Perl doesn't use arrays of codepoints. Even if it did, it doesn't
change anything anyway.

// UTF-16le
UV* c = ...;
*c = *(p++);
*(c++) |= *(p++) << 8;

is not anymore efficient than

// UTF-16be
UV* c = ...;
*c = *(p++) << 8;
*(c++) |= *(p++);

Except that if the count is large, or greater than 4 (normal case) on
>
LE machines, you do 4 at a time and skip the shifts(<<) and ors(|):
>

You can't do that for the first 0..3 characters because of alignment issues.

You can't do that for the last 0..3 characters because of boundary issues.

You can't do that since UTF-16 is a variable width format. (You are
incorrectly creating two characters in the destination buffer where there
is only one.)

*(q++)=*((unsigned int)q++) for any count >=8
>

Alignment error (not counting the missing "*").

---

This is the code. Note how UTF-16le ('v') is no faster than UTF-16be ('n').

static UV
enc_unpack(pTHX_ U8 **sp, U8 *e, STRLEN size, U8 endian)
{
    U8 *s = *sp;
    UV v = 0;
    if (s+size > e) {
	croak("Partial character %c",(char) endian);
    }
    switch(endian) {
    case 'N':
	v = *s++;
	v = (v << 8) | *s++;
    case 'n':
	v = (v << 8) | *s++;
	v = (v << 8) | *s++;
	break;
    case 'V':
    case 'v':
	v |= *s++;
	v |= (*s++ << 8);
	if (endian == 'v')
	    break;
	v |= (*s++ << 16);
	v |= (*s++ << 24);
	break;
    default:
	croak("Unknown endian %c",(char) endian);
	break;
    }
    *sp = s;
    return v;
}


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About