develooper Front page | perl.perl5.porters | Postings from February 2013

Re: [perl #116086] split "\x20" doesn't work as documented

Thread Previous | Thread Next
From:
Daniel Łukasiak
Date:
February 19, 2013 11:20
Subject:
Re: [perl #116086] split "\x20" doesn't work as documented
Message ID:
51235FC9.2010005@estrai.com
On 17/02/13 01:16, yves orton via RT wrote:
> On 13 December 2012 18:32, Daniel Lukasiak <perlbug-followup@perl.org> wrote:
>> # New Ticket Created by  Daniel Lukasiak
>> # Please include the string:  [perl #116086]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086 >
>>
>>
>> This is a bug report for perl from estrai@estrai.com,
>> generated with the help of perlbug 1.39 running under perl 5.17.7.
>>
>>
>> -----------------------------------------------------------------
>>
>> Hi,
>> split() has a special case for " " and "\x20" so they work like \s+
>
> Umm. I wasn't aware that we document that "\x20" works the same as " ".
>
> It used to, as an implementation accident, but I don't believe that we
> document that it should.
>
> The docs look like this:
>
>                 As a special case, specifying a PATTERN of space (' ')
> will split on white space just as "split" with no arguments does.
> Thus,
>                 "split(' ')" can be used to emulate awk's default
> behavior, whereas "split(/ /)" will give you as many initial null
> fields (empty
>                 string) as there are leading spaces.  A "split" on
> "/\s+/" is like a "split(' ')" except that any leading whitespace
> produces a
>                 null first field.  A "split" with no arguments really
> does a "split(' ', $_)" internally.
>
> That doesn't say "\x20" works the same.
>
> We changed which level of the perl parser handles escapes intended for
> the regex engine.
>
> Previous to this the \x20 would be resolved to a space, and as far as
> the regex engine was concerned the pattern would be " ".
>
> After this change the \x20 would be delivered to the regex engine
> verbatim and the \x20 form would not be recognized by the heuristic
> that handles the " " case.
>
> This change was very desirable for many reasons, and as it doesnt
> actually contradict the docs, unless Ricardo says otherwise I consider
> this Not A Bug.


Hi,
it looks like split's documentation has been reworded around 5.16 and it 
is now explicitly mentioning "\x20", vide:

perldoc -f split

"As another special case, "split" emulates the default behavior of the 
command line tool awk when the PATTERN is either omitted or a literal 
string composed of a single space character (such as ' ' or "\x20", but 
not e.g. "/ /").  In this case, any leading whitespace in EXPR is 
removed before splitting occurs, and the PATTERN is instead treated as 
if it were "/\s+/"; in particular, this means that any contiguous 
whitespace(not just a single space character) is used as a separator. 
However, this special treatment can be avoided by specifying the pattern 
"/ /" instead of the string " ", thereby allowing only a single space 
character to be a separator."

-- 
Daniel Łukasiak

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About