On Mon, Aug 26, 2013 at 11:00 PM, Karl Williamson <public@khwilliamson.com>wrote: > The other option was to make \p{gc=unassigned} succeed for non-Unicode > code points. But this isn't what Unicode says. A strict interpretation > fails this because Unicode has never said that a non-Unicode code point > should be considered unassigned. But I now believe it is more DWIM to > consider them so. > If you're worried that someone might want to distinguish Unicode-but-unassigned from non-Unicode, then you could extend gc to include gc=NonUnicode. However, I suspect suspect such a distinction is rarely needed, so it's probably better to include non-Unicode code points in gc=unassigned, and let those who want to distinguish unassigned code points from non-Unicode code points use (?[ [\p{gc=unassigned] - [\x{0}-\x{10FFFF}] ]) and (?[ [\p{gc=unassigned] - [^\x{0}-\x{10FFFF}] ]). (Maybe provide \p{Unicode}?) In general, it's clear that non-Unicode code points should behave as a Unicode code point without the property. No more /\p{XXX}/ && /\P{XXX}/ being true.Thread Previous | Thread Next