develooper Front page | perl.perl5.porters | Postings from February 2013

Re: [perl #116086] split "\x20" doesn't work as documented

Thread Previous | Thread Next
From:
demerphq
Date:
February 19, 2013 11:44
Subject:
Re: [perl #116086] split "\x20" doesn't work as documented
Message ID:
CANgJU+WMyJWnYL4eSrp=eDOs_XQ35GYnvJ5w--O6=WtUgmER5w@mail.gmail.com
On 19 February 2013 12:19, Daniel Łukasiak <estrai@estrai.com> wrote:
> On 17/02/13 01:16, yves orton via RT wrote:
>>
>> On 13 December 2012 18:32, Daniel Lukasiak <perlbug-followup@perl.org>
>> wrote:
>>>
>>> # New Ticket Created by  Daniel Lukasiak
>>> # Please include the string:  [perl #116086]
>>> # in the subject line of all future correspondence about this issue.
>>> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086 >
>>>
>>>
>>> This is a bug report for perl from estrai@estrai.com,
>>> generated with the help of perlbug 1.39 running under perl 5.17.7.
>>>
>>>
>>> -----------------------------------------------------------------
>>>
>>> Hi,
>>> split() has a special case for " " and "\x20" so they work like \s+
>>
>>
>> Umm. I wasn't aware that we document that "\x20" works the same as " ".
>>
>> It used to, as an implementation accident, but I don't believe that we
>> document that it should.
>>
>> The docs look like this:
>>
>>                 As a special case, specifying a PATTERN of space (' ')
>> will split on white space just as "split" with no arguments does.
>> Thus,
>>                 "split(' ')" can be used to emulate awk's default
>> behavior, whereas "split(/ /)" will give you as many initial null
>> fields (empty
>>                 string) as there are leading spaces.  A "split" on
>> "/\s+/" is like a "split(' ')" except that any leading whitespace
>> produces a
>>                 null first field.  A "split" with no arguments really
>> does a "split(' ', $_)" internally.
>>
>> That doesn't say "\x20" works the same.
>>
>> We changed which level of the perl parser handles escapes intended for
>> the regex engine.
>>
>> Previous to this the \x20 would be resolved to a space, and as far as
>> the regex engine was concerned the pattern would be " ".
>>
>> After this change the \x20 would be delivered to the regex engine
>> verbatim and the \x20 form would not be recognized by the heuristic
>> that handles the " " case.
>>
>> This change was very desirable for many reasons, and as it doesnt
>> actually contradict the docs, unless Ricardo says otherwise I consider
>> this Not A Bug.
>
>
>
> Hi,
> it looks like split's documentation has been reworded around 5.16 and it is
> now explicitly mentioning "\x20", vide:
>
> perldoc -f split
>
> "As another special case, "split" emulates the default behavior of the
> command line tool awk when the PATTERN is either omitted or a literal string
> composed of a single space character (such as ' ' or "\x20", but not e.g. "/
> /").  In this case, any leading whitespace in EXPR is removed before
> splitting occurs, and the PATTERN is instead treated as if it were "/\s+/";
> in particular, this means that any contiguous whitespace(not just a single
> space character) is used as a separator. However, this special treatment can
> be avoided by specifying the pattern "/ /" instead of the string " ",
> thereby allowing only a single space character to be a separator."

Hrm. I have a weird feeling I was involved in that change, so I guess
i have to eat my words.

It also seems to support the contention that FC's patch needs to be reverted.

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About