develooper Front page | perl.perl5.porters | Postings from July 2012

[perl #113620] highly illegal variable names are now accidentally legal

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
July 1, 2012 14:03
Subject:
[perl #113620] highly illegal variable names are now accidentally legal
Message ID:
rt-3.6.HEAD-28836-1341176609-961.113620-15-0@perl.org
On Tue Jun 26 15:18:01 2012, Hugmeir wrote:
> https://github.com/Hugmeir/utf8mess/tree/restrict_variable_names
> 
> So, I've taken a few liberties implementing this. Here's the executive
> summary of the branch:
> Length-one variables must match (?: (?=Word) [\p{XIDS}_] |
> [\p{POSIX_Punct}\p{POSIX_Digit}\p{POSIX_Cntrl}] ). This is irregardless of
> whenever 'use utf8;' is in effect, so $£ is now always illegal, though
> expanding this to use the some broader definition of punctuation/controls
> should be simple, it's just changing one macro.

Did you see the last few messages in this thread?  I think we should be
restricting it to the Latin-1 range, allowing all 0-255 characters as
punct vars, including \xad, unless they are id chars.

> And like mentioned before, valid characters in an identifier no longer
vary
> depending on 'use utf8', except for the obvious restriction that under 'no
> utf8;' the characters belong solely to the Latin-1 range.

I am not sure that is such a good idea.  Formerly, any Latin-1
characters could be used as pyoq delimiters.  There are many old scripts
still in used that have never needed to be rewritten.

‘use utf8’ is a pragma after all. :-)

> pod/perldata.pod
> has a section streamlining the rules. As a side effect, 'no utf8; use
> strict; $à' now has to declare $à with my(), as it well should.

I am the backward compatibility police, so I disagree. :-)

> The branch also fixes a bug in word and identifier parsing, where ASCII
> alphanumerics would be eaten up without checking if the next character
> matched \p{XIDC}. This lead to qq\N{MIDDLE DOT} test \N{MIDDLE DOT} to
work
> in previous versions, but MIDDLE DOT is an XIDC character, so now that's
> parsed as bareword( qq\N{MIDDLE DOT} ), bareword( test ), ???? XIDC
> character on it's own, syntax error. To get the previous behavior, you
need
> a space before the delimiter, which is consistent with how 'q mfoom'
works.

As I mentioned some time before, changing the rules for what is an
identifier is not a backward-compatible change.  If we want to make
Perl’s syntax conform more closely with Unicode recommendations, we
should do it all at once (ids, whitespace, and Pattern_Syntax for
delimiters) with a single feature feature.

Someone may mention (and someone has mentioned) that Perl isn’t ‘doing
it right’.  But Perl has been ‘doing it’ since before the current
definition of ‘right’ existed.

> Internally, three things might be sorta icky and really need someone to
> look them over; First, I changed the definition of isIDFIRST_lazy_if and
> isALNUM_lazy_if to use isIDFIRST_L1(*s) and (isALNUMC_L1(*s) || *s ==
'_'),
> respectively, if we aren't under UTF mode.
> Second, to fix the "ascii letters being consumed too early" bug above, I
> had to turn around how scan_ident and scan_word work, by putting the UTF
> case first. This probably leads to some slowdowns.
> Third, I've changed several spots from using isALNUM_lazy_if to
> isIDFIRST_lazy_if -- This made sense to me at the time, but an extra pair
> of eyes would be welcome.

Well, I haven’t read your patch because I disagree with it in principle.

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=113620

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About