develooper Front page | perl.perl5.porters | Postings from March 2012

[perl #107008] UTF8 patches for 5.16

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
March 24, 2012 13:27
Subject:
[perl #107008] UTF8 patches for 5.16
Message ID:
rt-3.6.HEAD-4610-1332620854-1930.107008-15-0@perl.org
On Sat Mar 24 12:40:24 2012, public@khwilliamson.com wrote:
> On 03/23/2012 03:51 PM, Father Chrysostomos via RT wrote:
> > Karl Williamson: You have some comments at the end of ticket perl #73022
> > that imply that you are/were working on this bug.  Can you tell us what
> > the status is?
> 
> What I was referring to was not the overall bug, but that Abigail 
> persuaded me that we should restrict the user-defined aliases in 
> "\N{...}" to begin with letters.  I did add code to toke.c to do this 
> (beginning in today's blead at line #3363).
> 
> However, that code doesn't check for above-Latin1 characters, as until 
> this patch is applied, it doesn't matter.  If we apply this patch in 
> 5.16, we need to revisit what we accept as characters in a name (I 
> personally have learned some things about Unicode since then, for 
> example, and also need to refresh my memory about this issue), and patch 
> this code as well.
> 
> The two-release deprecation cycle ends with 5.16, so that 5.18 can 
> actually forbid such names.
> 
> Given these reasons, I think it advisable to wait until 5.18 to fix this 
> bug.
> 
>   More importantly, can you tell us whether you think the
> > patch attached (and at
> > <https://github.com/Hugmeir/gsoc-pad-utf8-safety/commit/a885abdbb>) is
> > appropriate?
> >
> 
> I haven't been following the design of these fixes, so I don't 
> understand (without more effort) how the patch works.  Otherwise, it 
> looks good to me; I imagine you would be qualified to immediately 
> determine if it looks like the similar patches that have been applied. 
> I would like a test added where the requested name has not been defined, 
> so that we could verify that the error message that gets output looks 
> sane with above-Latin1 characters.

What the patch does is stop

use utf8;
$foo = "\N{ÿ}";

from being interpreted as "\N{ÿ}", where the former is \xff in a UTF-8
source file, and the latter is the UTF-8 octet sequence for \xff
interpreted as Latin-1.

If there are to be more changes later to \N{...}, I don’t know that it’s
so necessary to include this patch now.

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=107008

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About