develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Eric Brine
Date:
January 12, 2012 09:21
Subject:
Re: pack and ASCII
Message ID:
CALJW-qHyMVFFyhp6Rompk_PiOcKbT3UobHo+oJqgyTdgHVyW+w@mail.gmail.com
On Thu, Jan 12, 2012 at 11:36 AM, demerphq <demerphq@gmail.com> wrote:

> >> There are lots of cases where it is absolutely reasonable to care
> >> about how Perl is storing a string.
> >
> > Yes, which is why I never said otherwise.
>
> Ok, if you dont consider "If one has code that depends on how Perl
> stores a string in an SV" to be otherwise then it is seems
> unproductive to consider this conversation at all.
>

People's preference for the UTF8 flag, and their choice to use SvPVbyte or
not are not the same thing.

You can have a preference for the UTF8 flag while still properly use
SvPVbyte.


>  > I said the bug occurs when "code that depends on how Perl stores a
> string".
> >
> > People might very well care, and specifically, they'll care for
> performance
> > reasons.
> >
> >>
> >> At the very least because Perl at
> >> times cares how Perl is storing a string.
> >
> > File names (not fixed yet) and bitwise operators (won't fix).
>
> Output. Input. Data-exchange.
>

Forgive me if I misunderstand this because you didn't make a sentence. Are
you saying that Perl cares about the UTF8 flag when doing IO and
data-exchange?

Perl doesn't care about the UTF8 flag for IO. An upgraded E9 is treated
exactly the same as a downgraded E9, and anything above FF results in an
error unless you add an encoding layer.

As for data-exchange with the system, the only time Perl cares is for file
names, which p5p already acknowledged is a bug.

Which part do you consider buggy?
>

The part you didn't run the second time around. $ARGV[0]==1 gives an error
for chr(1000) and gives garbage with chr(233).


> >> And pack is used to inter-operate with other code and systems
> >
> > (s/is used/can be used/. Did you even look up U and W and A and a?)
> >
> > But yes, I HAVE NO OBJECTION TO DOWNGRADING WHEN POSSIBLE!
>
>  Downgrading is NOT the same thing as putting utf8 data in a non
> utf8_on perl scalar.
>

I don't know what you mean by that. Are you suggesting that C<< pack "A" >>
should do UTF-8 encoding?


>  >>   thus one is
> >>
> >> *absolutely, without ANY debate* allowed, and indeed *encouraged* to
> >> know what format a string is in.
> >
> > Not true. It's not like you can pass an SV to the system. Your typemap
> > should use SvPVbyte to get the string from your SV.
>
> my $output_string= encode_utf8($some_text);
> print $output_string;
>

C<print> uses SvPVbyte or similar, so I have no idea what you're showing me
that.

$ perl -e'$_=chr(0xE9); utf8::downgrade($_); print' | od -t x1
0000000 e9
0000001

$ perl -e'$_=chr(0xE9); utf8::upgrade($_); print' | od -t x1
0000000 e9
0000001

And C<print> gives an error if you pass something invalid (like chr(100)),
so it's an example of a function that doesn't suffer from The Unicode Bug.

>> So please drop this "If one has code that depends on how Perl stores a
> >> string in an SV" argument. It is simply wrong, and it is not helpful
> >> to keep repeating it over and over.
> >
> > So show me an example where that's not true.
>
> I already did. If you dont recognize them then I can only assume that
> you and I are using a different set of defintions for some words that
> are pretty critical to this discussion.
>

The only example was showed that print works as I said things should work:
downgrade when bytes are expected, give an error if not possible.

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About