develooper Front page | perl.perl5.porters | Postings from October 2018

Re: [perl #133547] Inconsistency in Script Run

Thread Previous | Thread Next
From:
Karl Williamson
Date:
October 2, 2018 14:09
Subject:
Re: [perl #133547] Inconsistency in Script Run
Message ID:
ec3d940f-53a2-e2f3-af76-c278a12fec90@khwilliamson.com
On 10/02/2018 03:57 AM, ph10@hermes.cam.ac.uk wrote:
> On Sun, 30 Sep 2018, Karl Williamson wrote:
> 
>> The fix for this should be put in 5.28.1.
> 
> I have downloaded v5.29.4 (v5.29.3-35-g4288c5b93b) and can confirm that
> all the issues I previous reported are fixed. However, there are still
> two oddities that don't seem to be right. The digit sequences FF10..FF19
> and 1D7CE..1D7FF (both in the Common script) don't seem to work as I
> expected them. A string containing them along with Latin characters is
> not valid as a script run in this testing Perl. Indeed, a string with
> only one of them and Latin characters doesn't match (which it surely
> should, regardless of being a digit, since it is in the Common script).
> Two of them on their own, without any Latin characters does match.
> 
> These strings match the pattern /^(*sr:.{4})/
> 
>    \x{ff10}\x{ff19}..
>    \x{1d7ce}\x{1d7cf},,
>    
> These don't:
> 
>    A\x{ff10}\x{ff19}B
>    A\x{ff10}BC
>    A\x{1d7ce}\x{1d7cf}B
>    A\x{1d7ce}BC

Technically, this isn't a bug, but a design flaw.

My design was to allow only ASCII 0-9 to be allowed with other scripts. 
Your second batch of cases here are in the Latin script, and therefore 
the only digits from Common that are allowed are the ASCII ones.

But that is not what a reasonable person would expect, and so the design 
is wrong.

I see two choices:

1) Allow the non-ASCII digits that are considered Common to match the 
Latin script

2) Allow these to match any script, just like the ASCII ones already do.

The second solution seems more in keeping with Unicode's intent, since 
they made these digits Common, so should be allowed in multiple scripts. 
  But the requirement that all digits in a run must come from the same 
sequence of 10 would remain.

I'm open to hearing arguments either way, or some third way.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About