develooper Front page | perl.perl5.porters | Postings from August 2018

Script runs and ASCII digits

Thread Next
From:
Abigail
Date:
August 16, 2018 01:31
Subject:
Script runs and ASCII digits
Message ID:
20180816013801.GA5759@almanda.fritz.box

The documentation of script runs says:


   The rules used for matching decimal digits are somewhat different.
   Many scripts have their own sets of digits equivalent to the Western 0
   through 9 ones.  A few, such as Arabic, have more than one set.  For a
   string to be considered a script run, all digits in it must come from
   the same set, as determined by the first digit encountered. The ASCII
   "[0-9]" are accepted as being in any script, even those that have their
   own set.  This is because these are often used in commerce even in such
   scripts.  But any mixing of the ASCII and other digits will cause the
   sequence to not be a script run, failing the match.  As an example,

    qr/(*script_run: \d+ \b )/x

   guarantees that the digits matched will all be from the same set of 10.
   You won't get a look-alike digit from a different script that has a
   different value than what it appears to be.


This leads me to believe that

   "1\N{THAI DIGIT FIVE}"

should not match

   /^(*sr:\d+)$/


However, it does. But 1 and "\N{THAI DIGIT FIVE}" do not belong to the
same set of 10.

Is this a bug in the implementation? Is the documentation wrong?
Or do I misunderstand the documentation?



Abigail

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About