develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
demerphq
Date:
January 12, 2012 09:45
Subject:
Re: pack and ASCII
Message ID:
CANgJU+VwkMDMjBKPULrVVNHU8P=j9E-Dcjz98zj8EptM_ecs6w@mail.gmail.com
On 12 January 2012 18:21, Eric Brine <ikegami@adaelis.com> wrote:
> On Thu, Jan 12, 2012 at 11:36 AM, demerphq <demerphq@gmail.com> wrote:
>>
>> >> There are lots of cases where it is absolutely reasonable to care
>> >> about how Perl is storing a string.
>> >
>> > Yes, which is why I never said otherwise.
>>
>> Ok, if you dont consider "If one has code that depends on how Perl
>> stores a string in an SV" to be otherwise then it is seems
>> unproductive to consider this conversation at all.
>
>
> People's preference for the UTF8 flag, and their choice to use SvPVbyte or
> not are not the same thing.
>
> You can have a preference for the UTF8 flag while still properly use
> SvPVbyte.

I think SvPVbyte has minimal to no relevance to this discussion.

>>
>> > I said the bug occurs when "code that depends on how Perl stores a
>> > string".
>> >
>> > People might very well care, and specifically, they'll care for
>> > performance
>> > reasons.
>> >
>> >>
>> >> At the very least because Perl at
>> >> times cares how Perl is storing a string.
>> >
>> > File names (not fixed yet) and bitwise operators (won't fix).
>>
>> Output. Input. Data-exchange.
>
>
> Forgive me if I misunderstand this because you didn't make a sentence. Are
> you saying that Perl cares about the UTF8 flag when doing IO and
> data-exchange?

Yes, clearly Perl cares about the UTF8 flag when doing IO.

And I also mean that the *programmer* may care about the utf8 flag
when doing IO.

> Perl doesn't care about the UTF8 flag for IO. An upgraded E9 is treated
> exactly the same as a downgraded E9, and anything above FF results in an
> error unless you add an encoding layer.

$ perl -wle'print chr(1000)'
Wide character in print at -e line 1.
Ϩ

proves you wrong.

> As for data-exchange with the system, the only time Perl cares is for file
> names, which p5p already acknowledged is a bug.

see my previous response.

>> Which part do you consider buggy?
>
>
> The part you didn't run the second time around. $ARGV[0]==1 gives an error
> for chr(1000) and gives garbage with chr(233).

So in the case where I didnt care how perl stored the string is wrong.

And in the case where I did care how perl stored the string is right?

Arent you contradicting yourself?

>
>
>>
>> >> And pack is used to inter-operate with other code and systems
>> >
>> > (s/is used/can be used/. Did you even look up U and W and A and a?)
>> >
>> > But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!
>>
>> Downgrading is NOT the same thing as putting utf8 data in a non
>> utf8_on perl scalar.
>
>
> I don't know what you mean by that. Are you suggesting that C<< pack "A" >>
> should do UTF-8 encoding?

No, im saying that pack "A" should not return a utf8 on string.

>
>>
>> >>   thus one is
>> >>
>> >> *absolutely, without ANY debate* allowed, and indeed *encouraged* to
>> >> know what format a string is in.
>> >
>> > Not true. It's not like you can pass an SV to the system. Your typemap
>> > should use SvPVbyte to get the string from your SV.
>>
>> my $output_string= encode_utf8($some_text);
>> print $output_string;
>
>
> C<print> uses SvPVbyte or similar, so I have no idea what you're showing me
> that.

$output_string contains utf8, but with the utf8 string OFF.

>
> $ perl -e'$_=chr(0xE9); utf8::downgrade($_); print' | od -t x1
> 0000000 e9
> 0000001
>
> $ perl -e'$_=chr(0xE9); utf8::upgrade($_); print' | od -t x1
> 0000000 e9
> 0000001
>
> And C<print> gives an error if you pass something invalid (like chr(100)),
> so it's an example of a function that doesn't suffer from The Unicode Bug.
>
>> >> So please drop this "If one has code that depends on how Perl stores a
>> >> string in an SV" argument. It is simply wrong, and it is not helpful
>> >> to keep repeating it over and over.
>> >
>> > So show me an example where that's not true.
>>
>> I already did. If you dont recognize them then I can only assume that
>> you and I are using a different set of defintions for some words that
>> are pretty critical to this discussion.
>
>
> The only example was showed that print works as I said things should work:
> downgrade when bytes are expected, give an error if not possible.

Downgrading is not the same as accessing the raw bytes in a string.

Just because bytes are expected does not mean that the right way to
get them is to _downgrade_ the string.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About