develooper Front page | perl.perl5.porters | Postings from August 2009

Re: [perl #45673] parsing in eval() varies with UTF8ness

Thread Previous | Thread Next
From:
Zefram
Date:
August 26, 2009 15:47
Subject:
Re: [perl #45673] parsing in eval() varies with UTF8ness
Message ID:
20090826224711.GE11252@fysh.org
Chip Salzenberg wrote:
>You have just agreed with me.  "Change of representation" = "encoding".

utf8::upgrade affects *internal* encoding.  Not the user-visible content
of the string.

>Perl's parser takes bytes and gives them meaning.  If you change the bytes,
>you can't expect Perl's parser to ignore that.

String eval is an operation on a string.  A string of *characters*, in
current Perl.  The Perl parser claims to ascribe meaning to characters,
not to bytes per se.

Obviously it's internally working with bytes.  A Perl source file on
disk is really a sequence of bytes, and the interpretation of those
bytes as characters is influenced by the "use utf8" pragma.  In the case
of string eval, the Perl string object already knows what characters it
represents, so without any pragma it already knows whether the internal
byte sequence needs to be interpreted via Latin-1 or UTF-8.  ("use utf8"
in a string eval seems meaningless.)

I believe the bug here is that the Perl parser is not consistently
responding to the character sequence.  This is presumably due to it
being implemented at the byte level, with insufficient abstraction.

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About