Front page | perl.perl5.porters |
Postings from March 2012
Re: [perl #107008] UTF8 patches for 5.16
Thread Previous
|
Thread Next
From:
Karl Williamson
Date:
March 24, 2012 14:28
Subject:
Re: [perl #107008] UTF8 patches for 5.16
Message ID:
4F6E3C5F.2040809@khwilliamson.com
On 03/24/2012 02:27 PM, Father Chrysostomos via RT wrote:
> On Sat Mar 24 12:40:24 2012, public@khwilliamson.com wrote:
>> On 03/23/2012 03:51 PM, Father Chrysostomos via RT wrote:
>>> Karl Williamson: You have some comments at the end of ticket perl #73022
>>> that imply that you are/were working on this bug. Can you tell us what
>>> the status is?
>>
>> What I was referring to was not the overall bug, but that Abigail
>> persuaded me that we should restrict the user-defined aliases in
>> "\N{...}" to begin with letters. I did add code to toke.c to do this
>> (beginning in today's blead at line #3363).
>>
>> However, that code doesn't check for above-Latin1 characters, as until
>> this patch is applied, it doesn't matter. If we apply this patch in
>> 5.16, we need to revisit what we accept as characters in a name (I
>> personally have learned some things about Unicode since then, for
>> example, and also need to refresh my memory about this issue), and patch
>> this code as well.
>>
>> The two-release deprecation cycle ends with 5.16, so that 5.18 can
>> actually forbid such names.
>>
>> Given these reasons, I think it advisable to wait until 5.18 to fix this
>> bug.
>>
>> More importantly, can you tell us whether you think the
>>> patch attached (and at
>>> <https://github.com/Hugmeir/gsoc-pad-utf8-safety/commit/a885abdbb>) is
>>> appropriate?
>>>
>>
>> I haven't been following the design of these fixes, so I don't
>> understand (without more effort) how the patch works. Otherwise, it
>> looks good to me; I imagine you would be qualified to immediately
>> determine if it looks like the similar patches that have been applied.
>> I would like a test added where the requested name has not been defined,
>> so that we could verify that the error message that gets output looks
>> sane with above-Latin1 characters.
>
> What the patch does is stop
>
> use utf8;
> $foo = "\N{ÿ}";
>
> from being interpreted as "\N{ÿ}", where the former is \xff in a UTF-8
> source file, and the latter is the UTF-8 octet sequence for \xff
> interpreted as Latin-1.
That sounds reasonable.
>
> If there are to be more changes later to \N{...}, I don’t know that it’s
> so necessary to include this patch now.
>
I'm saying we *should not* include it until we have made the changes to
\N{} that restrict the characters used in the name to be legitimate
ones. Otherwise, we have a back compat problem when we do make those
changes. Since none of these can work now, there isn't an issue until
this patch is applied.
However, a simple change for 5.16 to accommodate this patch could be to
just forbid explicitly all above-Latin1 characters. Code exists
currently to check the Latin1 characters even in UTF-8 (though it may
never have been tested because of this bug). In a later release we
could relax this requirement.
I'm willing to make this change and test if it is deemed desirable in 5.16.
The rule for Latin1 characters is that a name must begin with an
alphabetic, and contain only \w plus space, no-break space, parentheses
(because of existing Unicode names), and colons (because the name could
be of the form: 'Greek: alpha').
Thread Previous
|
Thread Next