Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
January 12, 2012 09:45
Re: pack and ASCII
Message ID: CANgJU+VwkMDMjBKPULrVVNHU8P=j9E-Dcjz98zj8EptM_ecs6w@mail.gmail.com
On 12 January 2012 18:21, Eric Brine <email@example.com> wrote:
> On Thu, Jan 12, 2012 at 11:36 AM, demerphq <firstname.lastname@example.org> wrote:
>> >> There are lots of cases where it is absolutely reasonable to care
>> >> about how Perl is storing a string.
>> > Yes, which is why I never said otherwise.
>> Ok, if you dont consider "If one has code that depends on how Perl
>> stores a string in an SV" to be otherwise then it is seems
>> unproductive to consider this conversation at all.
> People's preference for the UTF8 flag, and their choice to use SvPVbyte or
> not are not the same thing.
> You can have a preference for the UTF8 flag while still properly use
I think SvPVbyte has minimal to no relevance to this discussion.
>> > I said the bug occurs when "code that depends on how Perl stores a
>> > string".
>> > People might very well care, and specifically, they'll care for
>> > performance
>> > reasons.
>> >> At the very least because Perl at
>> >> times cares how Perl is storing a string.
>> > File names (not fixed yet) and bitwise operators (won't fix).
>> Output. Input. Data-exchange.
> Forgive me if I misunderstand this because you didn't make a sentence. Are
> you saying that Perl cares about the UTF8 flag when doing IO and
Yes, clearly Perl cares about the UTF8 flag when doing IO.
And I also mean that the *programmer* may care about the utf8 flag
when doing IO.
> Perl doesn't care about the UTF8 flag for IO. An upgraded E9 is treated
> exactly the same as a downgraded E9, and anything above FF results in an
> error unless you add an encoding layer.
$ perl -wle'print chr(1000)'
Wide character in print at -e line 1.
proves you wrong.
> As for data-exchange with the system, the only time Perl cares is for file
> names, which p5p already acknowledged is a bug.
see my previous response.
>> Which part do you consider buggy?
> The part you didn't run the second time around. $ARGV==1 gives an error
> for chr(1000) and gives garbage with chr(233).
So in the case where I didnt care how perl stored the string is wrong.
And in the case where I did care how perl stored the string is right?
Arent you contradicting yourself?
>> >> And pack is used to inter-operate with other code and systems
>> > (s/is used/can be used/. Did you even look up U and W and A and a?)
>> > But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!
>> Downgrading is NOT the same thing as putting utf8 data in a non
>> utf8_on perl scalar.
> I don't know what you mean by that. Are you suggesting that C<< pack "A" >>
> should do UTF-8 encoding?
No, im saying that pack "A" should not return a utf8 on string.
>> >> thus one is
>> >> *absolutely, without ANY debate* allowed, and indeed *encouraged* to
>> >> know what format a string is in.
>> > Not true. It's not like you can pass an SV to the system. Your typemap
>> > should use SvPVbyte to get the string from your SV.
>> my $output_string= encode_utf8($some_text);
>> print $output_string;
> C<print> uses SvPVbyte or similar, so I have no idea what you're showing me
$output_string contains utf8, but with the utf8 string OFF.
> $ perl -e'$_=chr(0xE9); utf8::downgrade($_); print' | od -t x1
> 0000000 e9
> $ perl -e'$_=chr(0xE9); utf8::upgrade($_); print' | od -t x1
> 0000000 e9
> And C<print> gives an error if you pass something invalid (like chr(100)),
> so it's an example of a function that doesn't suffer from The Unicode Bug.
>> >> So please drop this "If one has code that depends on how Perl stores a
>> >> string in an SV" argument. It is simply wrong, and it is not helpful
>> >> to keep repeating it over and over.
>> > So show me an example where that's not true.
>> I already did. If you dont recognize them then I can only assume that
>> you and I are using a different set of defintions for some words that
>> are pretty critical to this discussion.
> The only example was showed that print works as I said things should work:
> downgrade when bytes are expected, give an error if not possible.
Downgrading is not the same as accessing the raw bytes in a string.
Just because bytes are expected does not mean that the right way to
get them is to _downgrade_ the string.
perl -Mre=debug -e "/just|another|perl|hacker/"