develooper Front page | perl.perl5.porters | Postings from October 2017

Re: Unicode operators

Thread Previous | Thread Next
October 22, 2017 16:40
Re: Unicode operators
Message ID:
Philip R Brenan wrote:
>Please give me some idea of how to add these operators to Perl for real?

You'd have to start by solving the issue of source encoding.  In theory
some non-ASCII characters can already be used as part of Perl source,
in string literals and in identifiers, but this doesn't work properly.
The treatment of such characters, their semantic effect, depends on the
way in which the source is encoded, even when perl can correctly decode
source files to the same character sequence.

Incidentally, if you want non-ASCII characters to be accepted in
identifiers then there's a big unsolved problem about how such package
names should map to filenames for module loading.  There's a big
portability problem around using non-ASCII filenames, so you probably
want to map non-ASCII module names to pure ASCII filenames, but it'd
be quite a wrench to introduce such a name mangling layer where there
previously hasn't been one.  But if you use non-ASCII filenames, even
if the platform supports them, you'd need to fix the Unicode bug that
still exists in Perl's handling of filenames, one that remains because
it's especially tricky, and gets even trickier over time as the present
behaviour gets ever more entrenched through practice.

That's when perl is decoding correctly.  There's another layer of problems
around perl being bad at determining how a source file is encoded.
Regrettably, we don't have any sane way to signal to Perl what encoding
is used by a source file.  When performing file operations in Perl it is
up to the code requesting the operation to specify the file's encoding.
This sits uneasily with the system of reading code from a file specified
purely by name.

That's all just a threshold issue: that's what needs to be sorted out
for it to be acceptable in principle for us to make use of non-ASCII
characters in the language.  There's then a separate set of issues about
actually adding operators.

Currently, it is only technically feasible to add punctuation-type
operators directly in the perl core.  We've had some talk in the past
about a plugin mechanism for operators, but there's been no motion
on that for a while, and we're at least several steps away from being
able to add it.  So for the time being you'd have to convince us that
the specific operators being proposed would be worth the additional
maintainance burden in the core, and worth the additional complexity of
the base language for all users.  If the operators being proposed are
just alternate spellings of existing operators, as in the module you
pointed at, this criterion is unlikely to be satisfied.  If an operator
plugin mechanism is forthcoming, however, you'd be free to make a module
providing whatever operators you want.

There's an additional problem with adding aliases for "=>" and "->":
they have funny syntactic effects on their lhs operand, beyond their
semantics as operators.  Those effects are presently implemented
by hardcoded lookaheads for those character sequences.  An operator
plugin mechanism, if available, would quite likely not support this
kind of effect.  If adding an alias in the core it would of course
be possible to extend the core's lookahead code to look for the alias
too, but it's pretty likely that, due to the total lack of genericity,
there's also similar lookahead code in syntax-tweaking modules on CPAN,
which would not naturally know about the alias.

Finally, there are issues specific to non-ASCII operators.  To add
any such to the core, we'd have to decide whether it is in principle
a good idea to add such operators to the Perl 5 language.  We haven't
really discussed this yet: it's currently pretty much moot because of
the long-unsolved problems (discussed above) in Unicode source handling
that are exhibited by the limited Unicode usage that we already have.
If we were to adopt Unicode operators, we'd almost certainly adopt a rule
that each Unicode operator in the core language must have a pure ASCII
spelling, a rule that Perl 6 has.  But of course that means that the gain
from Unicode operators would be only cosmetic, making it an unattractive
cause in which to accept much cost.  As with adding any operator, there
are costs not only in core maintenance, but also in the cognitive burden
on Perl programmers.  Even programmers who don't use the operators in
their own code need to know about them to understand other people's code.

So, overall, quite a few big issues standing in the way of Unicode
operators.  There has been almost no progress on any of these issues in
the past several years.  If you want to contribute, I suggest that you
not work explicitly towards a goal of Unicode operators, but instead pick
one of the subproblems to concentrate on.  It would be especially helpful
to address the problems that are exhibited by existing language features.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About