Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
From: Eric Brine
January 12, 2012 09:21
Re: pack and ASCII
Message ID: CALJW-qHyMVFFyhp6Rompk_PiOcKbT3UobHo+oJqgyTdgHVyWemail@example.com
On Thu, Jan 12, 2012 at 11:36 AM, demerphq <firstname.lastname@example.org> wrote:
> >> There are lots of cases where it is absolutely reasonable to care
> >> about how Perl is storing a string.
> > Yes, which is why I never said otherwise.
> Ok, if you dont consider "If one has code that depends on how Perl
> stores a string in an SV" to be otherwise then it is seems
> unproductive to consider this conversation at all.
People's preference for the UTF8 flag, and their choice to use SvPVbyte or
not are not the same thing.
You can have a preference for the UTF8 flag while still properly use
> > I said the bug occurs when "code that depends on how Perl stores a
> > People might very well care, and specifically, they'll care for
> > reasons.
> >> At the very least because Perl at
> >> times cares how Perl is storing a string.
> > File names (not fixed yet) and bitwise operators (won't fix).
> Output. Input. Data-exchange.
Forgive me if I misunderstand this because you didn't make a sentence. Are
you saying that Perl cares about the UTF8 flag when doing IO and
Perl doesn't care about the UTF8 flag for IO. An upgraded E9 is treated
exactly the same as a downgraded E9, and anything above FF results in an
error unless you add an encoding layer.
As for data-exchange with the system, the only time Perl cares is for file
names, which p5p already acknowledged is a bug.
Which part do you consider buggy?
The part you didn't run the second time around. $ARGV==1 gives an error
for chr(1000) and gives garbage with chr(233).
> >> And pack is used to inter-operate with other code and systems
> > (s/is used/can be used/. Did you even look up U and W and A and a?)
> > But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!
> Downgrading is NOT the same thing as putting utf8 data in a non
> utf8_on perl scalar.
I don't know what you mean by that. Are you suggesting that C<< pack "A" >>
should do UTF-8 encoding?
> >> thus one is
> >> *absolutely, without ANY debate* allowed, and indeed *encouraged* to
> >> know what format a string is in.
> > Not true. It's not like you can pass an SV to the system. Your typemap
> > should use SvPVbyte to get the string from your SV.
> my $output_string= encode_utf8($some_text);
> print $output_string;
C<print> uses SvPVbyte or similar, so I have no idea what you're showing me
$ perl -e'$_=chr(0xE9); utf8::downgrade($_); print' | od -t x1
$ perl -e'$_=chr(0xE9); utf8::upgrade($_); print' | od -t x1
And C<print> gives an error if you pass something invalid (like chr(100)),
so it's an example of a function that doesn't suffer from The Unicode Bug.
>> So please drop this "If one has code that depends on how Perl stores a
> >> string in an SV" argument. It is simply wrong, and it is not helpful
> >> to keep repeating it over and over.
> > So show me an example where that's not true.
> I already did. If you dont recognize them then I can only assume that
> you and I are using a different set of defintions for some words that
> are pretty critical to this discussion.
The only example was showed that print works as I said things should work:
downgrade when bytes are expected, give an error if not possible.