Front page | perl.perl5.porters |
Postings from April 2000
Re: PATCH: perlre.pod (against 5.6.0)
From:
Hugo
Date:
April 29, 2000 12:53
Subject:
Re: PATCH: perlre.pod (against 5.6.0)
Message ID:
200004292001.VAA28216@crypt.compulink.co.uk
In <14962.957028538@chthon>, Tom Christiansen writes:
:*** perlre56.pod Sat Apr 29 09:42:35 2000
:--- perlre.pod Sat Apr 29 11:13:51 2000
Below is a patch with some minor fixes. Here are some other comments:
ll153-4: Otherwise, the lefter one always wins.
Cute though it is, I'd rather see something like 'the leftmost of the
two'.
l172: octal char (think of a PDP-11);
Does 'think of a PDP-11' actually help anyone understand this?
(I appreciate this phrase was not introduced by Tom's patch.)
l179: \u titlecase next char
Not sure what 'titlecase' means, or why it is more accurate than
'uppercase', nor why \U was not similarly changed.
ll577-580:
For reasons of security, this construct is normally forbidden if
the regex involves variable interpolation, unless the perilous C<use
re 'eval'> pragma has been used (see L<re>), or the variables contain
results of C<qr//> operator (see L<perlop/"qr/STRING/imosx">).
I don't think this is correct.
I _think_ the story is that without C<use re 'eval'>, you cannot combine
variables and code within a regexp, but that you can put code into a
qr// regexp (as long as you don't mix in variables to be interpolated),
and then interpolate the variable containing the evalable qr// pattern
along with other variables into a new pattern. Thus this is allowed:
$y = qr/(?{ "code here" })/;
/$x$y$z/;
.. but these aren't:
$x = qr/./; /(?{ "code here" }) $x/;
$x = qr/(?{ "code here" }) $y/;
l609: Execute I<code> and interpolate its result as more pattern.
I think 'as a subpattern' might be more accurate, since you can't
say, for example, /(??{ "(" }) . (??{ ")" })/.
ll638-9:
This is mostly useful as an efficiency hack
to optimize of what would otherwise be "eternal" matches [...]
"to optimize what would", "to break out of what would"?
ll695-7:
Be aware, however, that this pattern currently
triggers a warning message under the C<use warnings> pragma or B<-w>
switch saying it C<"matches the null string many times">.
I was unable to find evidence of this in any version of perl I have
here. I appreciate this sentence was not introduced by Tom's patch.
Hugo
--- pod/perlre.pod.old Sat Apr 29 19:47:02 2000
+++ pod/perlre.pod Sat Apr 29 20:57:00 2000
@@ -59,7 +59,7 @@
These are usually written as "the C</x> modifier", even though the
delimiter in question might not really be a slash. Any of these
modifiers may also be embedded within the regex itself using the
-C<(?I<flags>...) construct. See below.
+C<(?I<flags>...>) construct. See below.
The C</x> modifier itself needs a little more explanation. It tells
the regex parser to ignore whitespace that is neither backslashed
@@ -68,7 +68,7 @@
is also treated as a metacharacter introducing a comment, just as
in ordinary Perl code. This also means that if you want real
whitespace or C<#> characters in the pattern (outside a character
-class, where they are unaffected by C</x>), that you'll either have
+class, where they are unaffected by C</x>), you'll either have
to escape them or encode them using octal or hex escapes. Taken
together, these features go a long way towards making Perl's patterns
more readable. Note that you have to be careful not to include the
@@ -94,7 +94,7 @@
() Grouping
[] Character class
-By default, the C<^> metacharacter is matches only the beginning
+By default, the C<^> metacharacter matches only the beginning
of the string, the C<$> metacharacter only before an optional
trailing newline at the end, so Perl does certain optimizations
with the assumption that the string contains only one line. Embedded
@@ -353,7 +353,7 @@
interpreting C<\10> as a backreference only if at least 10 left
parentheses have opened before it. Likewise C<\11> is a backreference
only if at least 11 left parentheses have opened before it. And
-so on. C<\1> through C<\9> are always interpreted as backreferences."
+so on. C<\1> through C<\9> are always interpreted as backreferences.
Examples:
@@ -377,7 +377,7 @@
everything after the matched string.
The numbered variables ($1, $2, $3, etc.) and the related punctuation
-set (C<<$+>, C<$`>, C<$&>, and C<$'>) are all automatically localized
+set (C<$+>, C<$`>, C<$&>, and C<$'>) are all automatically localized
to the enclosing dynamic scope. Their values are therefore ephemeral
and best copied into more enduring variables. (See L<perlsyn/"Compound
Statements">.)
@@ -426,7 +426,7 @@
Perl also defines a consistent extension syntax for features not
found in standard tools like B<awk> and B<lex>. The syntax is a
pair of parentheses with a question mark as the first thing within
-the parentheses, such as C<(?I<X>...). The value of I<X> after the
+the parentheses, such as C<(?I<X>...)>. The value of I<X> after the
question mark determines which extension is selected.
Stability of these extensions varies widely. Some have been part
@@ -536,7 +536,7 @@
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
-This zero-width element evaluates to any embedded Perl code.
+This zero-width element evaluates any embedded Perl code.
Currently, the rules to determine where the C<code> ends are somewhat
convoluted. It is not an assertion, because it does not assert
anything: the success of the match is unrelated to the code's return
@@ -567,7 +567,7 @@
This construct may be used as a C<(?(condition)yes-pattern|no-pattern)>
switch. If I<not> used in this way, the result of evaluation of
C<code> is put into the special variable C<$^R>. This happens
-immediately, so C<$^R> can be used from other C<(?{ code })> assertions
+immediately, so C<$^R> can be used from other C<(?{ code })> elements
inside the same pattern.
The assignment to C<$^R> above is properly localized, so the old
@@ -694,7 +694,7 @@
finishes in a fourth the time when used on a similar string with
1000000 C<a>s. Be aware, however, that this pattern currently
triggers a warning message under the C<use warnings> pragma or B<-w>
-switch saying it C<"matches the null string many times">):
+switch saying it C<"matches the null string many times">.
On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
@@ -746,7 +746,7 @@
=head2 Backtracking
-NOTE: This section presents an abstract approximation of the how
+NOTE: This section presents an abstract approximation of how
the regex engine behaves. For a somewhat more rigorous (and harder
to understand) view of the rules involved in selecting a match among
possible alternatives, see L<Combining pieces together>.
@@ -1114,7 +1114,7 @@
$_ = 'bar';
s/\w??/<$&>/g;
-results in C<"<><b><><a><><r><>">. At each position of the string
+results in C<<"<><b><><a><><r><>">>. At each position of the string
the best match given by non-greedy C<??> is the zero-length match,
and the I<second best> match is what is matched by C<\w>. Thus
zero-length matches alternate with one-character-long matches.
@@ -1166,7 +1166,7 @@
substrings that can be matched by C<S>, C<B> and C<B'> are substrings
which can be matched by C<T>.
-If C<A> is better match for C<S> than C<A'>, C<AB> is a better
+If C<A> is a better match for C<S> than C<A'>, C<AB> is a better
match than C<A'B'>.
If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if
@@ -1238,8 +1238,8 @@
the functionality of the regex engine.
Suppose that we want to enable a new regex escape-sequence C<\Y|> that
-matches at boundary between white-space characters and non-whitespace
-characters. Note that C<(?=\S)(?<!\S)|(?!\S)(?<=\S)> matches exactly
+matches at the boundary between white-space characters and non-whitespace
+characters. Note that C<<(?=\S)(?<!\S)|(?!\S)(?<=\S)>> matches exactly
at these positions, so we want to have each C<\Y|> in the place of the
more complicated version. We can create a C<custom_re> module to do this:
@@ -1303,7 +1303,7 @@
L<perllocale>.
-L<perldebugs/"Debugger Internals">.
+L<perldebguts/"Debugger Internals">.
I<Mastering Regular Expressions> by Jeffrey Friedl, published
by O'Reilly and Associates.