develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Eric Brine
Date:
January 12, 2012 08:15
Subject:
Re: pack and ASCII
Message ID:
CALJW-qGLXithH7XxmCgWa4EFBHZGrx-J2r25vD8KSEAH7-Jf+g@mail.gmail.com
On Thu, Jan 12, 2012 at 7:40 AM, demerphq <demerphq@gmail.com> wrote:

> On 12 January 2012 02:45, Eric Brine <ikegami@adaelis.com> wrote:
> > No, not at all. I don't care at all how Perl chooses to store a string
> in an
> > SV. **If one has code that depends on how Perl stores a string in an SV,
> it
> > suffers from The Unicode Bug.**
>
> I really think you are overstating things, which is muddling this
> discussion.
>
> Consider:
>
> $ perl -MDevel::Peek -MEncode -le'Dump encode_utf8(chr(1000))'
> SV = PV(0x8dae758) at 0x8db0818
>  REFCNT = 1
>  FLAGS = (TEMP,POK,pPOK)
>  PV = 0x8dc21b8 "\317\250"\0
>  CUR = 2
>  LEN = 4
>
> Why is *that* string not marked UTF8 on? Because if it were it would
> be *useless*.
>

> There are lots of cases where it is absolutely reasonable to care
> about how Perl is storing a string.


Yes, which is why I never said otherwise.

I said the bug occurs when "code that depends on how Perl stores a string".

People might very well care, and specifically, they'll care for performance
reasons.


> At the very least because Perl at
> times cares how Perl is storing a string.
>

File names (not fixed yet) and bitwise operators (won't fix).

$ perl -MEncode -le'my $unicode= chr(1000); my $bytes=
> Encode::encode_utf8($unicode); if (shift) { print $unicode } else {
> print $bytes; }'
> Ϩ
> $ perl -MEncode -le'my $unicode= chr(1000); my $bytes=
> Encode::encode_utf8($unicode); if (shift) { print $unicode } else {
> print $bytes; }' 1
> Wide character in print at -e line 1.
> Ϩ
>

Not only do you get an error message, you get garbage if you change 1000 to
233. I'm not sure why you're showing me this, but this is a perfect
illustration of buggy code.

And pack is used to inter-operate with other code and systems


(s/is used/can be used/. Did you even look up U and W and A and a?)

But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!

Speaking of inter-operating with system, this is how one interoperates with
Windows:

$text = chr(2660);
encode("UCS-2le", pack("a$buf_size", $text))

  thus one is
>
*absolutely, without ANY debate* allowed, and indeed *encouraged* to
> know what format a string is in.
>

Not true. It's not like you can pass an SV to the system. Your typemap
should use SvPVbyte to get the string from your SV.

So please drop this "If one has code that depends on how Perl stores a
> string in an SV" argument. It is simply wrong, and it is not helpful
> to keep repeating it over and over.
>

So show me an example where that's not true.

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About