Front page | perl.perl5.porters |
Postings from January 2012
Re: \Questions about the \Future of \Escapes
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
January 11, 2012 07:18
Subject:
Re: \Questions about the \Future of \Escapes
Message ID:
20120111151849.GG9069@plum.flirble.org
On Wed, Jan 04, 2012 at 04:40:55PM -0500, Ricardo Signes wrote:
> Meanwhile, \L and \U are already problems. They need to be documented and
> so on. \F will have *exactly* the same class of problem, which is almost
> a good thing. It means when we fix \L and \U, in some way, as we probably
> must, our fix will be exactly the same category of fix.
>
> I think that we're better off landing the well-discussed, reasonable \F, and
> fixing the general escape issue at the appropriate pace.
Yes, I'd much prefer to have a \F now that is no worse or better than the
current \L and \U by being consistent with them.
Given that fixing things looks involved:
On Mon, Jan 09, 2012 at 04:12:28PM +0100, demerphq wrote:
> Thanks. I spent a bunch of time this weekend working on this and I
> think I have come up with the rules, and they are, well, whacky.
Aye.
> Given how insane they are, I am doubtful we can fix the behavior
> without breaking stuff. I also do not think that we should warn on
> useless use of "\E" given how easy it is to make a useful \E turn into
> a useless \E.
I'm not convinced on not warning, but not strongly convinced.
I may not need any persuading to change my mind as things wash out.
I think I can see a path to changing the behaviour with minimal breakage.
In that
1) changed paradigm only happens in the scope of a use v5.18.0
2) regular expressions
a) remember the paradigm that they were compiled under,
b) encode this when stringified in their (?^) prefix
c) this is used when interpolating into a regex in a different paradigm
This adds complexity. But it avoids possible problems of either
1) global change to the new paradigm, with small? backwards compatibility
risk
or instead
2) new paradigm being lexically controlled, but "surprises" with action at
a distance depending on whether compiled regex objects passed around are
used directly, or stringified, interpolated and recompiled
I don't know which is worse. Action at a distance definitely troubles me.
> FWIW, I got pretty far with pushing \u\U\l\L\E into the regex engine,
> but it ran into some issues. We probably want to let the toker handle
> \Q \E, as otherwise (?{ ... }) gets really tricky. However in order to
> support \Q "properly" the toker must know about the \U, \L and \E's as
> well.
\Q in the tokeniser, \L etc elsewhere, and both sharing \E troubles me.
This may be unfounded on my part. I'd also hate for qr// and "" to
diverge further - ie do double quoted strings need a similar imposition
of sanity?
> For now I think the most important thing to do is decide how these
> things SHOULD work and then make it all work like that. For instance,
> I think that is insane that "\U\QX\LY" results in only X being
> quotemeta'ed, but "\Q\UX\LY" results in both X and Y being
> quotemeta'ed.
Yes, definitely figure out what the rules should be.
> Here are the rules:
>
> \U and \L case-modify non-casemod text until the end of the string or
> the next relevant encountered \E, if there is already an unterminated
> \L or \U in effect then the new \U or \L will end any still in effect
> casemodifiers (note: this is not a typo \Q does not end any previous
> \Q \U or \L, but \U and \L do end any previous \Q).
>
> $ perl -le'print "\U[one]\Q[TWO]\L[THREE]"'
> [ONE]\[TWO\][three]
> $ perl -le'print "\Q[one]\U[TWO]\L[THREE]"'
> \[one\]\[TWO\]\[three\]
I think you misdescribed just the the parenthesised note, as your text
contradicts your second example. \U and \L terminate any previous \U or \L,
acting as an implicit \E at that point.
Also consistent with your description, but you didn't show as an example,
\Q nests, which I suspect is safe to declare a "bug" and just fix:
$ ./perl -le 'print "\Q[one]\Q[two]\E[three]\E"'
\[one\]\\\[two\\\]\[three\]
$ ./perl -le 'print qr/\Q[one]\Q[two]\E[three]\E/'
(?^:\[one\]\\\[two\\\]\[three\])
> \l and \u case-modify the next non-casemod text in the string, or
> nothing if there is no non-casemod text in between it and then next \U
> \Q \L or \E. Any preceding \L or \U take precedence, except in the
> case where the \l or \u immediately follow an \L or \U in which case
> the \l or \u take precedence.
> $ perl -le'print "\lFOO"'
> fOO
> $ perl -le'print "\ufoo"'
> Foo
> $ perl -le'print "\L\ufoo \ubar"'
> Foo bar
> $ perl -le'print "\U\lfoo \lfoo"'
> fOO FOO
That \L or \U take precedence is, um, strange, counter intuitive and less
useful than the other way round would be.
Nicholas Clark
Thread Previous
|
Thread Next