develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
demerphq
Date:
January 12, 2012 08:37
Subject:
Re: pack and ASCII
Message ID:
CANgJU+W5Ya-xjPZLV9sXyH9HkuEhdxOgu8GHtJYK_V0pHS6gig@mail.gmail.com
On 12 January 2012 17:15, Eric Brine <ikegami@adaelis.com> wrote:
> On Thu, Jan 12, 2012 at 7:40 AM, demerphq <demerphq@gmail.com> wrote:
>>
>> On 12 January 2012 02:45, Eric Brine <ikegami@adaelis.com> wrote:
>> > No, not at all. I don't care at all how Perl chooses to store a string
>> > in an
>> > SV. **If one has code that depends on how Perl stores a string in an SV,
>> > it
>> > suffers from The Unicode Bug.**
>>
>> I really think you are overstating things, which is muddling this
>> discussion.
>>
>> Consider:
>>
>> $ perl -MDevel::Peek -MEncode -le'Dump encode_utf8(chr(1000))'
>> SV = PV(0x8dae758) at 0x8db0818
>>  REFCNT = 1
>>  FLAGS = (TEMP,POK,pPOK)
>>  PV = 0x8dc21b8 "\317\250"\0
>>  CUR = 2
>>  LEN = 4
>>
>> Why is *that* string not marked UTF8 on? Because if it were it would
>> be *useless*.
>>
>>
>> There are lots of cases where it is absolutely reasonable to care
>> about how Perl is storing a string.
>
>
> Yes, which is why I never said otherwise.

Ok, if you dont consider "If one has code that depends on how Perl
stores a string in an SV" to be otherwise then it is seems
unproductive to consider this conversation at all.

> I said the bug occurs when "code that depends on how Perl stores a string".
>
> People might very well care, and specifically, they'll care for performance
> reasons.
>
>>
>> At the very least because Perl at
>> times cares how Perl is storing a string.
>
>
> File names (not fixed yet) and bitwise operators (won't fix).

Output. Input. Data-exchange.

>> $ perl -MEncode -le'my $unicode= chr(1000); my $bytes=
>> Encode::encode_utf8($unicode); if (shift) { print $unicode } else {
>> print $bytes; }'
>> Ϩ
>> $ perl -MEncode -le'my $unicode= chr(1000); my $bytes=
>> Encode::encode_utf8($unicode); if (shift) { print $unicode } else {
>> print $bytes; }' 1
>> Wide character in print at -e line 1.
>> Ϩ
>
>
> Not only do you get an error message, you get garbage if you change 1000 to
> 233. I'm not sure why you're showing me this, but this is a perfect
> illustration of buggy code.

UMM? WHAT?!

$ perl -MDevel::Peek -MEncode -le'my $unicode= chr(233); my $bytes=
Encode::encode_utf8($unicode); if (shift) { print $unicode } else {
print $bytes; }' 0
é

Which part do you consider buggy?

>> And pack is used to inter-operate with other code and systems
>
>
> (s/is used/can be used/. Did you even look up U and W and A and a?)
>
> But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!

Downgrading is NOT the same thing as putting utf8 data in a non
utf8_on perl scalar.

>
> Speaking of inter-operating with system, this is how one interoperates with
> Windows:
>
> $text = chr(2660);
> encode("UCS-2le", pack("a$buf_size", $text))

Er, maybe win2k, certainly not XP or later. Try UTF-16le there.

>>   thus one is
>>
>> *absolutely, without ANY debate* allowed, and indeed *encouraged* to
>> know what format a string is in.
>
>
> Not true. It's not like you can pass an SV to the system. Your typemap
> should use SvPVbyte to get the string from your SV.

my $output_string= encode_utf8($some_text);
print $output_string;

>
>> So please drop this "If one has code that depends on how Perl stores a
>> string in an SV" argument. It is simply wrong, and it is not helpful
>> to keep repeating it over and over.
>
>
> So show me an example where that's not true.

I already did. If you dont recognize them then I can only assume that
you and I are using a different set of defintions for some words that
are pretty critical to this discussion.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About