develooper Front page | perl.perl5.porters | Postings from September 2007

Re: [perl #45673] parsing in eval() varies with UTF8ness

Thread Previous | Thread Next
From:
Tels
Date:
September 23, 2007 11:22
Subject:
Re: [perl #45673] parsing in eval() varies with UTF8ness
Message ID:
200709231816.35500@bloodgate.com
Moin,

On Saturday 22 September 2007 23:55:20 Zefram wrote:
> # New Ticket Created by  Zefram
[snip]
>
> $ perl -we '$a="require x\x{f1}y::z"; eval $a; print $@'
> Warning: Use of "require" without parentheses is ambiguous at (eval 1)
> line 1. Unrecognized character \xF1 at (eval 1) line 1.
> $ perl -we '$a="require x\x{f1}y::z"; utf8::upgrade($a); eval $a; print
> $@' Can't locate xZZy/z.pm in @INC (@INC contains: /etc/perl
> /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5
> /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8
> /usr/local/lib/site_perl /usr/local/lib/perl/5.8.4
> /usr/local/share/perl/5.8.4 .) at (eval 1) line 3. $
>
> What I show above as "ZZ" was originally a sequence of two non-ASCII
> characters: U+00c3 (Latin capital letter A with tilde) and U+00b1
> (plus-minus sign).  I've replaced them with ASCII characters to avoid
> unpredictable manglement.

The sequence C3B1 is UTF-8 for "character 0xf1" so that is right.

> The phenomenon we see here is that the syntax of Perl, as judged by
> eval(), varies according to whether the input string is physically
> encoded in UTF8.  If it is so encoded then U+00f1, Latin small letter N
> with tilde, is an acceptable identifier character, and so can be part
> of a module name.  If not, then the very same character is invalid in
> that context and causes a syntax error.
>
> What, exactly, is Perl's identifier syntax?  Is U+00f1 a valid identifier
> character?

When you don't do "use utf8;" you script is expected to be in latin1 
(iso.-8859-1). (we leave "use locale" out of this for now). Under use utf8, 
it can contain any UTF-8.

However, it seems eval() (or require?) doesn't know about this. Plus, I am 
not entirely sure how much Unicode you can use in identifiers as something 
like this:

	#!perl
	use utf8;
	my $€ = 1;

still fails to compile with:

	Unrecognized character \x82 at t.pl line 5.

perldoc perlsyn (in 5.8.8) doesn't seem to say anything about identifiers.

perldoc  utf8 says:

       Enabling the "utf8" pragma has the following effect:

	Bytes in the source text that have their high‐bit set will be
	treated as being part of a literal UTF−8 character.  This
	includes most literals such as identifier names, string
	constants, and constant regular expression patterns.

But it doesn't seem to work in v5.8.8 at least.

All the best,

Tels


-- 
 Signed on Sun Sep 23 18:05:15 2007 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Spammed if you do, spammed if you don't."

  -- Murphy's Law


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About