On Fri, Apr 29, 2011 at 12:21:21PM -0600, Karl Williamson wrote:
> On 04/29/2011 11:53 AM, Corzine, Deven wrote:
> > The programmer expects the case-insensitive flag to be convenience to
> > avoid enumerating all case variations, much like the character class
> > negation is a convenience to avoid enumerating the entire character set
> > without a few unwanted characters.
>
> Again, I agree.
Except that negation can't actually be equivalent to enumerating the entire
character set less unwanted, else this would match:
$ ./perl -Ilib -lwe '$_ = "ss"; utf8::upgrade($_); print /\A[^ ]\z/i ? "Y" : "N"'
N
because "all of Unicode less space" includes ß, and /ß/i matches "ss"
So negation is behaving equivalent to multiple non-match (lookahead)
assertions, and a match on qr/./ (ie consume exactly one code point)
[which is making sense to me now, but is a surprise if you're thinking in sets]
aargh. Also, as this matches:
$ ./perl -Ilib -lwe '$_ = "ss"; utf8::upgrade($_); print /\A[\x80-\xFF]\z/i ? "Y" : "N"'
Y
shouldn't this?
$ ./perl -Ilib -lwe '$_ = "ss"; utf8::upgrade($_); print /\A[\x00-\xFF]\z/i ? "Y" : "N"'
N
(I was trying to test whether [^ ] was equivalent to [\x00-\x1F\x21-\x{1FFFF}]
and finding it a surprise)
Nicholas Clark
Thread Previous
|
Thread Next