develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Eric Brine
Date:
January 9, 2012 11:55
Subject:
Re: pack and ASCII
Message ID:
CALJW-qECSuqNfPvvWK-Ueu8BbCHbGqY_gVS06DU_=vMtA7Tjmg@mail.gmail.com
On Mon, Jan 9, 2012 at 11:36 AM, demerphq <demerphq@gmail.com> wrote:

> On 9 January 2012 17:27, Leon Timmermans <fawaka@gmail.com> wrote:
> > On Mon, Jan 9, 2012 at 4:46 PM, John P. Linderman (jpl)
> > <jpl@research.att.com> wrote:
> >> I was more concerned that the documentation suggested that pack/unpack
> would
> >> only work on ASCII strings, not on arbitrary strings.  Granted, the
> length
> >> associated with "a" and "A" might need some amplification, similar to
> what
> >> is there for the "length" function.  If pack/unpack cannot deal with
> >> non-ASCII strings (I know they work ok for bytes with the high-order bit
> >> on), then what happens when the corresponding argument includes
> non-ASCII
> >> characters?  -- jpl
> >
> > Well, your question led me to discover what I consider to be a bug:
> >
> > perl -E 'use utf8; my $packed = pack "A*", "ţ"; say
> utf8::is_utf8($packed);'
> > 1
>
> Interesting. I think I agree that this is a bug.
>

C<< pack 'A*' >> correctly packs all strings of bytes (whether UTF8=0 or
UTF8=1). I don't see why it's a bug that it usefully works for characters
that aren't bytes too. Are you using C<< pack 'A*' >> to validate your
data? You can use one of the following to do that:

    /[^\x00-\xFF]/ or die "Expecting string of bytes";

    utf8::downgrade($_, 1) or die "Expecting string of bytes";

 > Personally, it makes no sense to me to pass pack "A" a character
> > string in the first place, if only because it relies on the internal
> > encoding of perl for its result, but to return a character string is
> > just plain wrong.
>
> A and in particular Z are there so you can easily create data
> structures in Perl which can be passed into C/XS.
>

Such XS code expects the scalar to contain a string of bytes, so it should
use SvPV_byte instead of SvPV_utf8. If it does this correctly, it will get
what it should. If you attempt to pass garbage (character 0x0163),
SvPV_byte will warn (or croak?) "wide character".

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About