develooper Front page | perl.perl5.porters | Postings from November 2003

Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}

Thread Previous
Jarkko Hietaniemi
November 2, 2003 09:26
Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}
Message ID:
> I just happened to notice that the perlre man page describes the 
> POSIX "[:punct:]" character class as being equivalent to the unicode 
> "\p{IsPunct}" character class.
> I haven't tried to track down the respective standards documents for
> POSIX and Unicode to see whether these classes are _supposed_ to be
> equivalent over the printable ASCII character set, but when I test them

AFAIK there are currently no existing standards defining those
equivalences.  There has been some discussion about that in Unicode
consortium mailing lists, but in fact there are some doubts about the
wisdom of stating anything about such equivalences (because the C
standards where the :foo: originate have frankly no clue about the
more complex property structure of Unicode).

The closest upcoming standard is the proposed update to the TR18:, see Annex C.

If you say :punct: on a non-Unicode data, you are doing _operating_
_system_ _dependent_ AND _locale_ _dependent_ operation.  :punct: and
\p{Punct} are (supposed to be) equivalent with Unicode data.

> in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
> demonstrate:

Jarkko Hietaniemi <> "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About