develooper Front page | perl.perl6.announce.rfc | Postings from September 2000

RFC 328 (v3) Single quotes don't interpolate \' and \\

From:
Perl6 RFC Librarian
Date:
September 30, 2000 13:18
Subject:
RFC 328 (v3) Single quotes don't interpolate \' and \\
Message ID:
20000930201744.29321.qmail@tmtowtdi.perl.org
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Single quotes don't interpolate \' and \\

=head1 VERSION

  Maintainer: Nicholas Clark <nick@talking.bollo.cx>
  Date: 28 Sep 2000
  Last Updated: 30 Sep 2000
  Mailing List: perl6-language@perl.org
  Number: 328
  Version: 3
  Status: Frozen

=head1 CHANGES

Reissued on perl6-language@perl.org - I goofed the list.

Clarified the description slightly; by single quoted string I mean '' and q()

Updated discussion section

Frozen not withdrawn (see discussion section)

=head1 DISCUSSION

I'm in two minds as to whether to freeze or retract this RFC

Reaction was strongly polarised; three strongly against and one strongly
for. People valued their ability to use single quotes to easily make
strings containing single quotes. Michael Fowler expresses

    Whew.  Disallowing escapes in a single-quote string does not make easy
    things easier and hard things possible.

    I'm not arguing that we should keep it simply because people are used
    to it, but instead we should keep it because it's useful.

My view was that the majority are against the change, but views were from
B<existing> perl users [who do you expect as the majority on perl6 lists?
:-)]. The change would penalise existing perl users, but benefit new perl
users (and presumably people teaching perl).

However, I'm wrong on that. Hildo Biersma states

   Now, I have been teaching perl for a number of years, and nobody's ever
   had trouble with understanding how single quotes and the two escapes
   work.  Plenty of people find double-quotes either too powerful or too 
   limited (see the various RFCs), but I think single quotes are fine

However, there was no comment on the secondary issue of how single
quotes treat unrecognised escapes. To me the following seems wrong:

   $  perl -lwe "print q(Quoted \( \\ \) Not \' \t \z)"
   Quoted ( \ ) Not \' \t \z

And from the archives:

   Does it strike anyone else as odd that 'foo\\bar' eq 'foo\bar'?         
 
(Steve Fink, http://www.mail-archive.com/perl6-language@perl.org/msg04008.html)

I conclude

=over 4

=item 1 Removing C<\> escaping of C<\> and the delimiter from '' and q() would make perl less useful to the majority

=item 2 How '' and q() deal with C<\> followed by a non-escaped character isn't a major concern to perl programmers

=back

hence escaping as is should remain, but it would be possible to change the
unrecognised escape behaviour (say C<\z> maps to C<z> like a "" string)
without causing pain, if this change were deemed sensible. My view is that
(2) should be considered, hence I freeze rather than withdraw the RFC.

=head1 ABSTRACT

Remove all interpolation within single quotes and the C<q()> operator, to
make single quotes 100% shell-like. C<\> rather than C<\\> gives a single
backslash; use double quotes or C<q()> if you need a single quote in your
string.

=head1 DESCRIPTION

Camel III (page 7) says "Double quotation marks (double quotes) do
I<variable interpolation> and I<backslash interpolation> while single quotes
suppress interpolation."  Page 60 qualifies this with "except for C<\'> and
C<\\>".

In perl single quotes are used to generate strings. Double quotes also generate
strings.

In C single quotes are used to make character constants. Double quotes are
used to make string constants. Backslash interpretation is performed in
single quotes in C. While multi-character constants are allowed by C, they
are strongly discouraged as they are non-portable, and a character constant
in C is a type distinct from a string constant. Hence double quotes and
single quotes signify different things.

In shell, single quotes are used to make strings. Double quotes also make
strings. Within single quotes backslashes are ordinary characters, and do
not quote anything. As one can't quote a C<'> with a C<\> there is no way to
interpolate a single quote within a single quoted string, but a workaround
such as C<'don'\''t'> relying on the concatenation of C<'don'> C<\'> and
C<'t'> achieves the desired results.

Hence perl's single quoted strings are analogous to shell's single quoted
strings, not C's. However, they're not identical, as perl allows C<\\> to
mean an embedded C<\>, C<\'> to mean an embedded C<'>.

This RFC argues that the exception is confusing and proposes to remove it.
This makes perl more regular in shell terms, and slightly more easy to learn
for the shell programmer.

It also makes perl internally simpler more regular. Currently the behaviour
for C<q()> strings is that C<\(> C<\)> and C<\\> map to 1 character,
C<\>I<?> for all other I<?> maps to 2 characters. C<qq()> differs as
C<\>I<?> maps to 1 character both when I<?> is recognised as a backslash
escapes, B<and> when it is unrecognised. A further irregularity is that
currently single quoted here docs don't interpolate C<\\> or C<\'>. The
consequence of this is that currently

	'foo\\bar' eq 'foo\bar'

which sure looks odd.

With this RFC it is proposed that in a single quoted string and the q()
operator C<\> is not special. Hence C<\>I<?> always maps to 2 characters
(C<\> then I<?>) unless I<?> is the closing terminator, in which case the
string terminates with that C<\> . Single quoted strings behave like single
quoted here docs, and like shell single quoted strings.

You don't lose any functionality, as

   'don\'t implement this RFC, the benefits don\'t outweigh the confusion'

can still be written

   q(don't implement this RFC, the benefits don't outweigh the confusion)

which is actually less typing.

=head1 IMPLEMENTATION

Modify the tokeniser/lexer not to treat C<\> as special, hence the first end
delimiter ends the string.

For 5.7's toke.c this doesn't appear that simple. it looks like
modifications would be needed to C<S_tokeq>, C<scan_str> and C<Perl_yylex>
(for a quoted string at the start of curlies). There are probably more; the
code that makes single quoted strings interpolate C<\'> and C<\\> appear to
be deeply ingrained into the core.

The perl5 to perl6 converter would need to convert single quoted strings and
C<q()> operators containing C<\'> to the shortest (clearest?) equivalent of:

=over 4

=item *

double quoted string with C<\Q>...C<\E> or  C<\> escapes

=item *

single quoted string(s) and explicit concatenation of C<"'">

=item *

C<q()> operator with delimiters not in the string

=back

Single quoted strings containing C<\\> but no quoting of delimiters would
need to have C<\\> converted to C<\>

=head1 REFERENCES

Camel III - Programming Perl (3rd Edition)

perlop manpage for interpolation

RFC 226: Selective interpolation in single quotish context.

I believe that Larry Wall once made a comment about \' and \\ in single
quoted strings being a mistake, but I can't find any reference. The idea
certainly isn't mine, but I feel it worthy of consideration, even if
considered opinion is that any gain is to small to outweigh the upheaval.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About