develooper Front page | perl.perl5.porters | Postings from January 2012

Re: [perl #108164] regex property extensions: \p{X-Confusable=A}from UTS#39

Thread Previous | Thread Next
From:
Karl Williamson
Date:
January 18, 2012 13:56
Subject:
Re: [perl #108164] regex property extensions: \p{X-Confusable=A}from UTS#39
Message ID:
4F173FEA.1050909@khwilliamson.com
On 01/13/2012 08:26 AM, tchrist1 (via RT) wrote:
> # New Ticket Created by  tchrist1
> # Please include the string:  [perl #108164]
> # in the subject line of all future correspondence about this issue.
> #<URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=108164>
>
>
> Currently, there is no (reasonable) way for the user to implement
> properties like \p{X-Confusable=A} (that is, from UTS#39) on their own.
>
> I feel this is a bug; hence, this filing.
>
> Here are issues blocking the user-level implementation of such a scheme:
>
>   *  The super-annoying new restriction that all user-defined properties *must*
>      start with /^I[sn]/ for them to be paid any attention to.
>
>   *  There is no way to have "parameterized" \p{NAME=VALUE} user properties, even
>      when the NAME is an X-foo user name (let alone an X-VALUE user value for an
>      existing property.) Consider whow X-Confusable=VALUE needs to be able to
>      take at a minimum, an arbitrary code point, and in fact probably an
>      arbitrary string, as its value.
>
>   *  Apropos locating user-defined properties, there may be concerns about which
>      package the pattern was compiled in versus which one it is executed in,
>      along with the related issue of serialization needed for qr// recompilation.
>
> Because this is not possible for the user to do this for himself, I
> necessarily request that it be fully implemented in the core for v5.18.
>
> Currently only user-defined binary properties are allowed, which is not good
> enough, because it's nuts to expect people to write a \p{Is_X-Confusable__A}
> binary property or similar ridiculousness.  Even worse, you'd have to have a
> special function for *EVERY POSSIBLE UNICODE CODE POINT*, and you could never
> do full strings.  You surely do not want a hundred thousand things in the
> symbol table -- or a million -- nor do you not want a hundred thousand little
> "XConfus" *.pl files, either.
>
> Yes, that's asking a great deal, but we are given no choice: currently only
> the core can do this because of these bugs related to user properties.
>
> Therefore a perfectly reasonable alternative to implementing it in the core
> is *TO MAKE IT POSSIBLE* for a user to implement it as a module outside the
> core.  I would actually prefer that solution.  But right now, bugs get in
> the way, so an in-core implementation tracking UTS#39 is the only way to do
> this under current technology.
>
> See http://stackoverflow.com/a/8841591/471272 for elaboration of the
> "confusable" issue and proposed property, including how this relates
> to UTS#39.
>
> --tom
>

I'm unable to find any mention of this property extension on the ICU web 
site.  Do they assume that the user wants the "Mixed-Script, Anycase 
Confusables", which is what I suspect?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About