develooper Front page | perl.perl5.porters | Postings from February 2013

Re: [perl #116086] split "\x20" doesn't work as documented

Thread Previous | Thread Next
Daniel Łukasiak
February 19, 2013 11:20
Re: [perl #116086] split "\x20" doesn't work as documented
Message ID:
On 17/02/13 01:16, yves orton via RT wrote:
> On 13 December 2012 18:32, Daniel Lukasiak <> wrote:
>> # New Ticket Created by  Daniel Lukasiak
>> # Please include the string:  [perl #116086]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: >
>> This is a bug report for perl from,
>> generated with the help of perlbug 1.39 running under perl 5.17.7.
>> -----------------------------------------------------------------
>> Hi,
>> split() has a special case for " " and "\x20" so they work like \s+
> Umm. I wasn't aware that we document that "\x20" works the same as " ".
> It used to, as an implementation accident, but I don't believe that we
> document that it should.
> The docs look like this:
>                 As a special case, specifying a PATTERN of space (' ')
> will split on white space just as "split" with no arguments does.
> Thus,
>                 "split(' ')" can be used to emulate awk's default
> behavior, whereas "split(/ /)" will give you as many initial null
> fields (empty
>                 string) as there are leading spaces.  A "split" on
> "/\s+/" is like a "split(' ')" except that any leading whitespace
> produces a
>                 null first field.  A "split" with no arguments really
> does a "split(' ', $_)" internally.
> That doesn't say "\x20" works the same.
> We changed which level of the perl parser handles escapes intended for
> the regex engine.
> Previous to this the \x20 would be resolved to a space, and as far as
> the regex engine was concerned the pattern would be " ".
> After this change the \x20 would be delivered to the regex engine
> verbatim and the \x20 form would not be recognized by the heuristic
> that handles the " " case.
> This change was very desirable for many reasons, and as it doesnt
> actually contradict the docs, unless Ricardo says otherwise I consider
> this Not A Bug.

it looks like split's documentation has been reworded around 5.16 and it 
is now explicitly mentioning "\x20", vide:

perldoc -f split

"As another special case, "split" emulates the default behavior of the 
command line tool awk when the PATTERN is either omitted or a literal 
string composed of a single space character (such as ' ' or "\x20", but 
not e.g. "/ /").  In this case, any leading whitespace in EXPR is 
removed before splitting occurs, and the PATTERN is instead treated as 
if it were "/\s+/"; in particular, this means that any contiguous 
whitespace(not just a single space character) is used as a separator. 
However, this special treatment can be avoided by specifying the pattern 
"/ /" instead of the string " ", thereby allowing only a single space 
character to be a separator."

Daniel Łukasiak

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About