develooper Front page | perl.perl5.porters | Postings from June 2013

NWCLARK TPF grant report #84

From:
Nicholas Clark
Date:
June 10, 2013 13:05
Subject:
NWCLARK TPF grant report #84
Message ID:
20130610130512.GX4940@plum.flirble.org
[Hours]		[Activity]
2013/04/08	Monday
10.25		RT #116943/S_scan_indent
=====
10.25

2013/04/09	Tuesday
 6.75		RT #116943/S_scan_indent
 0.75		reading/responding to list mail
=====
 7.50

2013/04/10	Wednesday
 0.25		HvFILL
 0.25		RT #117003
 0.25		RT #117327
 0.25		RT #117543
 2.75		RT #54044
 3.00		reading/responding to list mail
=====
 6.75

2013/04/12	Friday
 0.25		HvFILL
 3.75		reading/responding to list mail
=====
 4.00

2013/04/14	Sunday
 2.75		Unicode Names
=====
 2.75

Which I calculate is 31.25 hours

The major activity this week was having second thoughts about $* and friends.
As mentioned in last month's report, everything smoked fine, so the change
was merged to blead. Only then did the problems emerge. Specifically Father
Chrysostomos demonstrated rather succinctly that the core's tests weren't
comprehensive enough. The tests correctly verified that using any of @*, &*
** and %* generated the desired deprecation warning. But the warning was
also generated by *{*}, *{"*"} and C<$_ = "*"; *$_>, none of which "need"
to be deprecated. Nothing tested these, so nothing noticed that they now
warned. This was because I'd adapted the code used to warn for $*, and made
it warn for all. However, there's a significant difference. The "magic"
functionality of $* (globally setting multiline matching) was what was
deprecated and then removed. It was nothing to do with parsing the two
characters $* as a punctuation variable, hence the warning needed to be
triggered by any *use* of the scalar variable *, independent of what syntax
was used to assign to it. For this reason, the best place to inject that
warning was the code which creates typeglobs and used to set up the magic
which made $* work. As that code is dealing in typeglobs, it already had
logic to determine whether the request was for the SCALAR slot or one of the
others, so it was simple to extend it to warn for all slots, extending
warnings from $* to all variables named *.

Simple, obvious and wrong.

The error being that the intent was to deprecate the *syntax* @* etc, not
use of the variable itself. Hence right place to insert a deprecation would
be in the parser. Specifically, toke.c. 12151 lines of horror most aptly
summarised as 'It all comes from here, the stench and the peril.'

Strangely for toke.c, it seems that it's actually fairly easy to deprecate
the parsing of @* etc. Tokens are parsed by a routine S_scan_indent() which
is relatively self-contained, and the control flow around it is also fairly
clear. So a warning can be issued by adding another parameter to that routine,
and only setting it true from the 4 places in the parser that deal with things
starting '@', '&', '*', and '%' respectively. This worked.

However, the seconds thoughts went deeper than that. I think that even this
approach is wrong on two further levels.

Firstly the intent was to enable syntax of the form @*foo and %*bar.
Having @*foo and %*bar would seem to imply that one can't also have @* or %*.

What hadn't sunk is is that we have both $# and $#foo (and $#$foo), and there
doesn't seem to be a parsing problem with this. There *is* some special casing
for which punctuation vars $#... will work on, notably only @+, @- and @@:

        if (s[1] == '#' && (isIDFIRST_lazy_if(s+2,UTF) || strchr("{$:+-@", s[2]\
))) {
            PL_tokenbuf[0] = '@';


and, unlike most of Perl 5, recognising $#... is space sensitive:

    $ perl -le '$foo = [1..3]; $# = \*STDERR; print $#{$foo}'
    $# is no longer supported at -e line 1.
    2
    $ perl -le '$foo = "bar"; %# = (bar => "Baz"); $# = \*STDERR; print $# {$foo}'
    Baz

(in the latter, $# {$foo} is a hash lookup for key $foo of hash %#)

but it generally works without surprising anyone.

As best I can figure out, one could add @*foo, &*foo, **foo, %*foo $*foo
without removing anything or breaking any sane code on CPAN. The only code
which I think would change behaviour is that is either using $* as the
variable for a file handle passed to print (anything else?), or code which
would parse $*+=1 as $*+ = 1 instead of $* += 1, or code which is making
array slices on %*.

So I think that the right thing to do is not to blanket deprecate parsing @*
&* ** %* and $*, but instead change the parser to warn or deprecate on the
specific ambiguous constructions. Which means that the "new" "needed"
constructions need to come first. Or at least some idea of them.

But, I think I'm wrong again, because the specific intent was to have
consistent "slurpy" syntax for subroutine signatures. Consistent with Perl
6, and consistent between the Perl 5 signature and regular Perl 5 code.

And I got this wrong. In that, Perl 6 does have @*foo. But that's a dynamic
scoping lookup. The slurpy syntax is *@foo. (And *%foo, and *$foo)
http://perlcabal.org/syn/S06.html#List_parameters

For which we don't need to worry about the various sigils used with the **
typeglob at all. We need to consider how the parser deals with the 3
typeglobs *@, *% and *$. And based on how $# and $#foo are handled, I think
that everything that is wanted for "new" syntax is currently a syntax error.
Or, if not, all that is currently legal syntax is incredibly obscure corner
cases.


So the net result of all of this was better tests, a bit better understanding 
of another 0.1% of the tokeniser, and a bug fix, in that $* and $# now warn for
every location that references them. Previously there were "holes" through
which they could be used but avoid the warning. A lot of motion, but not much
movement.

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About