On Fri, Nov 14, 2008 at 06:08:57AM +0000, Ben Morrow wrote:
>
> Quoth public@khwilliamson.com (karl williamson):
> >
> > One thing to consider, and I don't claim to know the answer, is what to
> > do about the newer single character shortcuts that have always (in their
> > short lives) been defined as matching non-ascii as well: \v, \h, their
> > complements, and I think without checking: \R. Do they only match
> > ascii, and one gets non-ascii by saying, eg, \pv ?
>
> I would say yes, definitely. That way there's a simple rule to follow: a
> single-char \x shortcut is ASCII-only, and to make it Unicode prepend \p
> or \P. The tricky case is \R: the whole point of \R is that it matches
> the Unicode definition of a newline, and that it's the recommended regex
> escape for that definition. That suggests it should stay as it is (it's
> already different from the others, as \r and \R are not complements).
\R is not a character class, as it can match the two character string "\r\n".
But there are other tricky cases, \C matches an octet, but such an
octet isn't limited to ASCII, it matches codes points 128-255 as well.
And reducing \X to ASCII only is utterly silly. (OTOH, \X isn't a
character class, so it may be an exception like \R).
> I'm not sure whether it's worth having both \Pw and \pW: on the one
> hand it's unnecessary redundancy, but on the other it makes things
> simpler to understand and remember.
>
> Having just checked the definition of \R in perlreref, I noticed the
> following entries:
>
> \H A non horizontal whitespace
> \V A non vertical whitespace
>
> Notwithstanding the missing hyphen, this parses as (/\s/ and not /\h/).
> Am I right in assuming this should be (not /\h/), and the wording should
> be amended? If so, what would be suitable? 'Anything but a horizontal
> whitespace', or simply 'The complement of \h'?
perlrecharclass says:
\h Match a horizontal white space character.
\H Match a character that isn't horizontal white space.
\v Match a vertical white space character.
\V Match a character that isn't vertical white space.
Having said that, "perlreref" is a *quick* reference to Perl's
regular expressions. I think the current wording is fine. Details
should be spelled out in perlre and perlrecharclass.
Abigail
Thread Previous
|
Thread Next