Front page | perl.perl5.porters |
Postings from December 2010
Re: [perl #80030] Matching upper ASCII characters from file in RE patterns
Thread Previous
|
Thread Next
From:
demerphq
Date:
December 11, 2010 05:43
Subject:
Re: [perl #80030] Matching upper ASCII characters from file in RE patterns
Message ID:
AANLkTimO26K9-h1aK5qaimy7TH99+_kb5PeTgi7bFuYQ@mail.gmail.com
On 9 December 2010 08:16, Eric Brine <ikegami@adaelis.com> wrote:
> On Thu, Dec 9, 2010 at 1:35 AM, Jonathan Pool <pool@utilika.org> wrote:
>
>> > Jonathan, you said that the encoding was utf8, but \x80 is not a legal
>> utf8-encoded character. But it should have warned that it was substituting
>> FFFD.
>>
>> The script reads a line from a UTF8-encoded file into a Perl scalar.
>>
>
> The file is being read in without issue. The problem is with the literals in
> the source file.
>
> It then operates on the scalar.
>>
>> In man perlunicode, one reads: "Unless explicitly stated, Perl operators
>> use [...]
>>
>
> You explicitly stated you wanted different behaviour from the literal by
> using "use encoding".
>
> perl -e'use encoding "utf8"; qr/[\x7F-\x80]'
>
> means
>
> perl -e'qr/{{{decode("utf8", "[\x7F-\x80]")}}}/'
>
> which becomes
>
> perl -e'qr/[\x7F-\x{FFFD}]/'
>
> The effect of "use encoding" on \x escapes in literals and the like is why
> some people avoid "use encoding".
Yes, for many including me, it seems rather insane, I guess for some
it makes sense, but I really wish they had picked a different escape
to use than remapping \x{}.
Also, and much worse is that at least up until 5.10 this insane
remapping of codepoints also affected: \N{U+$codepoint} syntax.
Fixed sometime since then as its not in blead, but i havent checked
when or if it is fixed in 5.12.
$ perl -v && perl -le'use encoding "iso 8859-7"; $a = "\xDF";
$b="\N{U+DF}"; printf "0x%04x\n", ord for $a,$b'
This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi
Copyright 1987-2009, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
0x03af
0x03af
$ ./perl -v && ./perl -Ilib -le'use encoding "iso 8859-7"; $a =
"\xDF"; $b="\N{U+DF}"; printf "0x%04x\n", ord for $a,$b'
This is perl 5, version 13, subversion 7 (v5.13.7-265-gb1811a1*) built
for i686-linux
(with 1 registered patch, see perl -V for more detail)
Copyright 1987-2010, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
0x03af
0x00df
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Previous
|
Thread Next