develooper Front page | perl.perl5.porters | Postings from December 2010

Re: [perl #80030] Matching upper ASCII characters from file in RE patterns

Thread Previous | Thread Next
From:
Eric Brine
Date:
December 9, 2010 12:50
Subject:
Re: [perl #80030] Matching upper ASCII characters from file in RE patterns
Message ID:
AANLkTikwesHQ-omoLE1Vz4eU2e9DYcnFh9RN8N-89r3j@mail.gmail.com
On Thu, Dec 9, 2010 at 11:58 AM, Jonathan Pool <pool@utilika.org> wrote:

> > The file is being read in without issue. The problem is with the literals
> in the source file.
> >
> > You explicitly stated you wanted different behaviour from the literal by
> using "use encoding".
> >
> > perl -e'use encoding "utf8"; qr/[\x7F-\x80]'
> >
> > means
> >
> > perl -e'qr/{{{decode("utf8", "[\x7F-\x80]")}}}/'
> >
> > which becomes
> >
> > perl -e'qr/[\x7F-\x{FFFD}]/'
> >
> > The effect of "use encoding" on \x escapes in literals and the like is
> why some people avoid "use encoding".
>
> Thank you for this explanation.
>
> So, is it possible for the source code (in a UTF-8 file) to use \x80 (or
> any numeric \x escape) to represent the character U+0080?
> ˉ
>

C2 80 is the UTF-8 encoding of U+0080, so the following are equivalent:

$x = "\x80";

and

use encoding 'UTF-8';
$x = "\xC2\x80";

(Except perhaps in how the UTF8 flag is set, but that's not suppose to make
a difference.)

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About