develooper Front page | perl.perl5.porters | Postings from December 2010

Re: [perl #80030] Matching upper ASCII characters from file in RE patterns

Thread Previous | Thread Next
From:
Eric Brine
Date:
December 8, 2010 23:17
Subject:
Re: [perl #80030] Matching upper ASCII characters from file in RE patterns
Message ID:
AANLkTinq31Bd27fHTMO6+J2RGPonSAB0hLJvgJjCCAci@mail.gmail.com
On Thu, Dec 9, 2010 at 1:35 AM, Jonathan Pool <pool@utilika.org> wrote:

> > Jonathan, you said that the encoding was utf8, but \x80 is not a legal
> utf8-encoded character.  But it should have warned that it was substituting
> FFFD.
>
> The script reads a line from a UTF8-encoded file into a Perl scalar.
>

The file is being read in without issue. The problem is with the literals in
the source file.

It then operates on the scalar.
>
> In man perlunicode, one reads: "Unless explicitly stated, Perl operators
> use [...]
>

You explicitly stated you wanted different behaviour from the literal by
using "use encoding".

perl -e'use encoding "utf8"; qr/[\x7F-\x80]'

means

perl -e'qr/{{{decode("utf8", "[\x7F-\x80]")}}}/'

which becomes

perl -e'qr/[\x7F-\x{FFFD}]/'

The effect of "use encoding" on \x escapes in literals and the like is why
some people avoid "use encoding".

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About