On Fri, Jan 28, 2011 at 08:30:22AM -0700, Karl Williamson wrote:
> On 01/28/2011 02:05 AM, Abigail wrote:
>> On Thu, Jan 27, 2011 at 11:15:23PM -0700, Karl Williamson wrote:
>>> As mentioned briefly earlier in this list, \p{} uses Unicode rules
>>> always regardless of the locale.
>>>
>>> commit fb2e24cdda774d9e9c28f1cd0356bba9070894c7
>>> Author: Karl Williamson<public@khwilliamson.com>
>>> Date: Thu Jan 27 22:29:51 2011 -0700
>>>
>>> regcomp: Add warning if tries to use \p in locale.
>>>
>>> \p implies Unicode matching rules, which are likely going to be
>>> different t han
>>> the locale's.
>>
>>
>> So, people that use \p in their patterns so that they aren't effected
>> by whatever locale may be in effect get rewarded for their good behaviour
>> by a new warning?
>
> It isn't good behavior. A regular expression using locale shouldn't be
> using \p; it should instead be using [:posix:] (and I need to rewrite
> the perldiag to indicate this; thanks for bringing this out).
>
> Suppose the locale is Latin7, Greek. The character at 0xD7 there is a
> capital Chi. But, \p thinks it is a multiplication sign. So, most any
> \p, \pL for example, will give the wrong result. [[:alpha:]] should
> give the correct result.
But if locale is in effect (due to "use locale"), one may have strings
that are *not* in the locale. Locale may be used because user interaction
happens in a specific, configurable locale, but the program may also
have strings that come from something else which doesn't use a locale.
Now, one could wrap any such match inside a 'no locale;' block; but one
also may have used \p properties instead.
Abigail
Thread Previous
|
Thread Next