develooper Front page | perl.perl5.porters | Postings from April 2014

RFC: What should end qr//x comment ?

Thread Next
From:
Karl Williamson
Date:
April 27, 2014 04:44
Subject:
RFC: What should end qr//x comment ?
Message ID:
535C8B72.4030804@khwilliamson.com
We decided a couple of releases ago that eventually we would treat as 
ignored white-space under /x all the Unicode white-space characters that 
they have specified for such a purpose, namely those matching the 
property \p{Pattern White Space}.  There are 11 code points in this 
property, and that's guaranteed to never change.

(As an aside, if they really want to change this, they would introduce a 
new property, something like \p{XPatWS}, and encourage people to migrate 
to it)

Fortunately, the set of code points that Perl accepted under /x for 
white-space is a proper subset of what Unicode suggests.  The 5 missing 
ones are
     U+0085 NEXT LINE
     U+200E LEFT-TO-RIGHT MARK
     U+200F RIGHT-TO-LEFT MARK
     U+2028 LINE SEPARATOR
     U+2029 PARAGRAPH SEPARATOR

Two of these are for rudimentary processing for languages that are 
written Right-to-Left, but the other three are all intended to start (at 
least) a new line.

Releases 5.18 and 5.20 raise a default-on deprecation warning when any 
of these 5 characters are used as literals in a /x pattern.  That means 
that in 5.22 we can change to skip them under /x.

In implementing this, I realized that it seems to be the right thing to 
do to end a comment not just with a \n, but any of these three that 
indicate a new-line.  But I want to give a chance for dissenting opinions.

One might argue that any of the vertical white space controls should end 
a comment, FF, VT, and especially CR.  All of these are considered  \R 
(linebreak), and so it makes sense.  But it has worked the other way for 
a long time without apparent problem, so I think we should just leave 
these as-is.

There is a minor glitch, as the still-experimental (?[ ]) regex sets 
code was added allowing all of the pattern white space characters.  But 
for the comment ending it uses anything that matches \R, instead of what 
I'm proposing here.  Since this is experimental, we can change it any 
way that is convenient.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About