develooper Front page | perl.perl6.language | Postings from February 2006

tokenizer hints, supporting delimited identifiers or symbols

Thread Next
From:
Darren Duncan
Date:
February 7, 2006 15:22
Subject:
tokenizer hints, supporting delimited identifiers or symbols
Message ID:
p06230900c00ec8298ea1@[192.168.1.100]
All,

I would like for there to be a simple and terse way for Perl 6 
identifiers or symbols, including variable and subroutine and 
identifier names, to be able to be composed of any characters 
whatsoever, even whitespace, as it is possible to do in some other 
languages like SQL, and as it is possible to name filesystem files.

I also want to emphasize that what I'm looking for is simply a 
compile time feature; the delimited identifiers are always literal 
constants resolvable at compile time, so there is no possible 
deferral to runtime like with symbolic references that can come from 
variables.

This would asist in having closer mapping when porting code from a 
language like PLSQL to Perl, or invoking code in such languages, but 
also gaining that native ability internally.  And simply remapping 
characters, like spaces to underscores, won't work partly because of 
clashes like if the source had both a "the var" and a "the_var" 
already.  And certain other workarounds, like hex-escaping all source 
identifiers, would cause obfuscation, which is bad for understanding 
the result.

In a way, this would be a wider application of that hash keys can 
already contain any characters, or that named parameter arguments can 
be string-quoted, though the latter are akin to identifiers in the 
method declarations.

Unless its already done, I see that support for this is only 
something that the tokenizer, and perhaps wider parser, of Perl 6 
code has to be concerned with, and all other parts of the Perl 6 
runtime don't have to care.  Because, really, one main reason it 
isn't common place to, say have space characters in variable names, 
is because that could make the parser's job more difficult when 
determining the boundaries of a symbol name in code.

I propose that this can be accomplished with a simple and optional 
de-sugaring of the language that simply provides clues to the 
tokenizer in the form of special delimiters.

For example, if Perl 6 doesn't currently have back-tick (`) 
delimiters reserved (I forget) like Perl 5 does for invoking the Unix 
shell, we could use that; literal occurances of the delimiter 
characters in the identifier would be backslash-escaped as usual like 
with the single-quote (') delimited strings.   Or if you consider 
this being used rarely, we could huffman code to have a longer 
delimiter like "qi()" or "qs()" or something.

If the delimited identifier would be valid as a non-delimited 
identifier (since it only contains alphanums for example), which Perl 
6 code is composed of by default, then delimited and non-delimited 
versions of the same can be intermixed as equivalent; otherwise (eg, 
if they contain whitespace), they appear only in delimited form.

Using the back-ticks as an example, we could say:

   my $baz = 7; # parsed symbol is "baz"
   say $baz;    # parsed symbol is "baz"

   my $`foo` = 3; # parsed symbol is "foo"
   say $`foo`;    # parsed symbol is "foo"

   my $`the bar` = 5; # parsed symbol is "the bar"
   say $`the bar`;    # parsed symbol is "the bar"

Similarly, with subroutine or method names:

   method `do it` (:$`with this`) { ... }

   $myobj.`do it`( 'with this' => 17 );
   $myobj.`do it`( :`with this`<44> );

Note that named arguments can already have string quoted key names, I 
think, this is sort of an extension of that.

Of course, the exact syntax can be different, but I want to not lose 
functionality that I have in other languages and environments when in 
Perl 6.

Unless we have this feature, I would have to resort to either storing 
all symbols in hashes, or hex-escaping them all to ensure useable 
characters without name collisions, and that makes the resulting code 
obfuscated and hard to understand; I don't want to obfuscate.

Thank you. -- Darren Duncan

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About