develooper Front page | perl.perl5.porters | Postings from February 2011

Re: Pushed patch to warn if using \p in locale

Thread Previous | Thread Next
From:
Karl Williamson
Date:
February 19, 2011 13:03
Subject:
Re: Pushed patch to warn if using \p in locale
Message ID:
4D603003.5060209@khwilliamson.com
On 02/17/2011 01:05 PM, Karl Williamson wrote:
> On 02/14/2011 11:40 AM, Abigail wrote:
>> On Mon, Feb 14, 2011 at 11:12:38AM -0700, Karl Williamson wrote:
>>> On 02/01/2011 10:55 AM, Karl Williamson wrote:
>>>> On 02/01/2011 04:03 AM, Abigail wrote:
>>>>> On Fri, Jan 28, 2011 at 08:30:22AM -0700, Karl Williamson wrote:
>>>>>> On 01/28/2011 02:05 AM, Abigail wrote:
>>>>>>> On Thu, Jan 27, 2011 at 11:15:23PM -0700, Karl Williamson wrote:
>>>>>>>> As mentioned briefly earlier in this list, \p{} uses Unicode rules
>>>>>>>> always regardless of the locale.
>>>>>>>>
>>>>>>>> commit fb2e24cdda774d9e9c28f1cd0356bba9070894c7
>>>>>>>> Author: Karl Williamson<public@khwilliamson.com>
>>>>>>>> Date: Thu Jan 27 22:29:51 2011 -0700
>>>>>>>>
>>>>>>>> regcomp: Add warning if tries to use \p in locale.
>>>>>>>>
>>>>>>>> \p implies Unicode matching rules, which are likely going to be
>>>>>>>> different t han
>>>>>>>> the locale's.
>>>>>>>
>>>>>>>
>>>>>>> So, people that use \p in their patterns so that they aren't
>>>>>>> effected
>>>>>>> by whatever locale may be in effect get rewarded for their good
>>>>>>> behaviour
>>>>>>> by a new warning?
>>>>>>
>>>>>> It isn't good behavior. A regular expression using locale
>>>>>> shouldn't be
>>>>>> using \p; it should instead be using [:posix:] (and I need to rewrite
>>>>>> the perldiag to indicate this; thanks for bringing this out).
>>>>>>
>>>>>> Suppose the locale is Latin7, Greek. The character at 0xD7 there is a
>>>>>> capital Chi. But, \p thinks it is a multiplication sign. So, most any
>>>>>> \p, \pL for example, will give the wrong result. [[:alpha:]] should
>>>>>> give the correct result.
>>>>>
>>>>>
>>>>> But if locale is in effect (due to "use locale"), one may have strings
>>>>> that are *not* in the locale. Locale may be used because user
>>>>> interaction
>>>>> happens in a specific, configurable locale, but the program may also
>>>>> have strings that come from something else which doesn't use a locale.
>>>>>
>>>>> Now, one could wrap any such match inside a 'no locale;' block; but
>>>>> one
>>>>> also may have used \p properties instead.
>>>>
>>>> I'm having trouble grokking what you're saying. Perhaps an example
>>>> would
>>>> help. It sounds like you're saying that someone wants to mix locale and
>>>> non-locale regexes and expects Perl to be able to intuit which is which
>>>> and work correctly for both.
>>>>
>>>> Now, we've added regex modifiers /l, /d so that they can override
>>>> locally the scope's default.
>>>>>
>>>>>
>>>>>
>>>>> Abigail
>>>>>
>>>>
>>>
>>> Off-list: I haven't gotten a reply from you about this, so I don't know
>>> if I convinced you, or you've been too busy, or you think I'm too
>>> stubborn to listen, or what. But I'm considering adding a similar
>>> message for \N{}, as only when the locale is Latin-1 are the names 100%
>>> correct.
>>
>>
>> Sorry about that.
>>
>>
>> What I mean is situations like this:
>>
>>
>> sub pattern_for_foo {
>> ...
>> '...\p{Nd}...\p{L}...'; # Use \p instead of \w and \d as to *not*
>> # influenced by whatever locale may be in
>> # effect.
>> }
>>
>>
>> {
>> use locale; # Use whatever local for user interaction.
>>
>> my $val = fetch_from_file;
>> my $pat = pattern_for_foo;
>>
>> if ($val =~ /$pat/) {
>> ...
>> }
>> }
>>
>>
>> IMO, this should not warn. Using \p is the right thing - the pattern is
>> used in a place where locale is in effect, but it shouldn't be influenced
>> by any locale. Because it's matched against something that is.
>>
>>
>> Sure, if \p is matched against something that's locale dependent, then
>> it's wrong. But we have no way of knowing. And warning in such cases is,
>> wrong. Specially if it's a regression.
>>
>
> What I'm getting is that people have used this method as a work-around
> for Perl's lack of tools in the past to distinguish between locale and
> non-locale regexes. But now we have the tools. And we have the usual
> tension between not adding warnings to code that has been written, and
> making it easier for code that is yet to be written. My view is that
> adding the message will save more total time for people in the world
> than not adding it. One idea I had is to somehow only enable the message
> when 'use 5.014' is specified. (I don't know how easy that is to
> implement.)
>

Now reverted, as the deadline for contentious patches had already passed.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About