develooper Front page | perl.perl5.porters | Postings from December 2001

Unicode SCRIPT and BLOCK names

Thread Next
Jeffrey Friedl
December 23, 2001 14:33
Unicode SCRIPT and BLOCK names
Message ID:

I just got bit by a change 5.6->5.8 change in what \p{InTibetan} means.

In 5.6, \p{InTibetan} refers to a Unicode _block_.
In bleedperl, \p{InTibetan} now refers to the Unicode _script_.

Unicode scripts are superior to blocks... unless you were expecting block
semantis, as legacy code does. It turns out that
is not part of the script, so I got bit.

Programming Perl says:

    "Note that these 'In' properties are only testing to see if the
     character is in the block of characters allocated for that script."

Since scripts are closer to general categories (which use 'Is') than to
blocks, it might be appropriate to keep \p{In...} as the block, while
adding \p{Is...} to refer to the script:

                 perl5.6  bleedperl       *proposed*
                 -------  -------------   -------------
     InTibetan   block    script||block   block
     IsTibetan   error    script||block   script
     Tibetan     error    script||block   script||block

       "script if available, block otherwise"

This preserves existing semantics, keeps the Is/In distinction consistant,
yet allows easy access to the superior script concept.

Another benefit is that it allows us to get rid of the {TibetanBlock}
names, which are sometimes there and sometimes not (they are there when
there's a script with the same name, so you still have to go to the man
page every time to see which to use.)


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About