develooper Front page | perl.perl5.porters | Postings from August 2013

Re: [perl #119239] started out as doc clarification needed in'eval...but...

Thread Previous | Thread Next
From:
Ricardo Signes
Date:
August 14, 2013 02:10
Subject:
Re: [perl #119239] started out as doc clarification needed in'eval...but...
Message ID:
20130814021015.GB6043@cancer.codesimply.com
* Linda Walsh <perlbug-followup@perl.org> [2013-08-12T01:25:21]
> 1) Doesn't it, at all, depend on the context of where it is 
> called?  I.e. if "use utf8", is in effect, and I say:

All that utf8.pm does is indicate that your source code is encoded in UTF-8, so
that if your source document has this:

  use utf8;
  my $band = "Queensrÿche";

...the length will be 11, not 12 because the "ÿ" will be one codepoint (the ÿ
character encoded in the file) rather than two (the UTF-8-encoded stored bytes
in the file).

Perl's design is that, as much as possible, performing texty operations on
strings treats the strings as strings of Unicode codepoints.  The utf8 pragma
is not meant to do anything but tell perl(1) to decode the input document.

Some functions in perl have, historically, behaved based on weird guesses or
bad heuristics as to context.  The is "The Unicode Bug."  It was fixed for many
operations by "use feature 'unicode_strings'", telling even more of the
language "seriously, perl, it's all text."

"eval" had The Unicode Bug, which is now fixed in the scope of "use feature
'unicode_eval'".  So, in fact, the behavior of "eval" *does* depend on the
context of where it is called... but the thing that matters is the unicode_eval
feature rather than the utf8 pragma.

Meanwhile, for those cases where one has read a bytestream and wants perl to
evaluate it as if it was reading those bytes from a file, eval_bytes was added.

I hope this has clarified things.  The perl string model can be a big pain, but
we are fairly stuck with it at the moment.

> #use utf8;		#doesn't seem to be necessary for utf8 in source
> #	and nothing needs to be done for utf8 on output?

I'm not sure I understand the above comment, but I hope that the question is
answered by my text, above.  "use utf8" will not affect output.  Actually,
here's an example of how it will, in some ways:

  ~$ perl -E 'use warnings; my $band = "Queensrÿche"; say $band'
  Queensrÿche
  ~$ perl -E 'use warnings; use utf8; my $band = "Queensrÿche"; say $band'
  Queensr?che

In the first case, we have that 12-element string which contains the raw UTF-8.
If we tried checking it for /\xFF/ it would fail, since it doesn't have that
character.  Similarly, /\p{Latin_1_Supplement}/ would fail.  On the other hand,
it prints back out correctly because my terminal is also UTF-8.

In the second one, /\xFF/ would match (huzzah! and also
\p{Latin_1_Supplement}), but the output is screwed up because it emits octet
0xFF.  Oops!

If we want to get "worse," we can just pick a worse band!

  ~$ perl -E 'use warnings; use utf8; my $band = "Spın̈al Tap"; say $band'
  Wide character in say at -e line 1.
  Spın̈al Tap

Now we have a string of Unicode codepoints.  There are 11 (rather than the 13
octets in the input to -e), including 9 ASCII characters, the dotless i, and
the combining diaeresis.  The two non-ASCII characters are above 0xFF, so when
they're printed, perl can't just emit the byte with the value of the character.
It punts, emitting the in-memory representation, which happily is UTF-8, so it
*seems* like the program did the right thing.  To remind us that we got lucky
(because we prefer English metal to West Coast USA metal), it emits a warning:
"I just printed a character bigger than 0xFF so you probably forgot to encode."

So, output behaves the same way under "use utf8" other than the fact that your
output-producing code is getting different inputs!

> my $string="“犬夜叉”";
> our $value=int rand 2;
> our $newvalue;
> our $newvalue2;
> use P;
> eval q($newvalue="_$string_, val=$value";);
> @_ and die @_;
> P "value=%s, newvalue=%s, string=%s", $value, $newvalue, $string;
> eval qq($newvalue2="_$string\_, val=$value");
> @_ and die @_;
> P "value=%s, newvalue2=%s, string=%s", $value, $newvalue2, $string;

I don't have P installed, and I wasn't sure it would be worth the trouble.

I'm hoping that explanations above have rendered this section "answered."

-- 
rjbs

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About