2009/12/10 Juerd Waalboer <juerd@convolution.nl>: > demerphq skribis 2009-12-10 13:23 (+0100): >> And, [[:word:]] is spelled [[:alnum:]]. > > juerd@lanova:~$ perl -le'print "foo" =~ /[[:word:]]/' > 1 > > See perlre See regexec.c and regcomp.c for the source of our mutual confusion. >> You cannot have both the current behaviour and non buggy implementation. > > Fully agreed. That's certainly not what I'm after, either. > >> Simply put I consider that: >> [^STUFF] matching the same code points as [STUFF] to be an irrefutable >> and overwhelming reason why the current behavior of POSIX charclass >> cannot be preserved. > > What exactly do you mean by "current behaviour"? > > To fix the issue that codepoints 128..255 are included depending on > internal encoding, there are two options: > > - Ignore anything above 127 > - Provide full unicode semantics. > > The first, ASCII-only, would be a mistake. No it wouldnt. There are no "unicode semantics" for POSIX. It is a fundamental error to speak of there being any. > Perhaps there is other current behaviour that I am not aware of. Apparently my hint wasnt strong enough. Try matching all the legal codepoints against [^POSIX] and against [POSIX] And note all the cases where you have both matching. Then do it with the strings in unicode. Note all the errors. These are fundamental errors. For me this debate is over, POSIX charclasses are not Unicode charclasses and any contortion to try to make them so is futile and doomed to screw stuff over. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next