develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
From:
demerphq
Date:
November 13, 2008 12:08
Subject:
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
9b18b3110811131208h1f1eb6c2l9c71ccfcfc27bdc2@mail.gmail.com
2008/11/13 karl williamson <public@khwilliamson.com>:
> demerphq wrote:
[snip]
>> Assuming that grok_oct() consumes at most 3 octal digits, I think we
>> can apply Karls patch. However I do think we should recommend against
>> using octal IN REGULAR EXPRESSIONS. And should note that while you CAN
>> use octal to represent codepoints up to 511 it is strongly recommended
>> that you don't.
>>
>> Also I have a concern that Karls patch merely modifies the behaviour
>> in the regular expression engine. It doesn't do the same for other
>> strings. If it is going to be legal it should be legal everywhere.
>>
> grok_oct() itself consumes as many octal digits as there are in its
> parameter, as long as the result doesn't overflow a UV.  It is used for
> general purpose octal conversion, such as from the oct() function.

Somewhere tho we have to have a limit on the number of digits dont we?

(I'm very tired right now and haven't looked)

> My patch was to bring consistency to the handling of \400-\777.  Outside
> re's, putting them into a string variable will cause the string to be
> converted to utf8, and so they will be converted into two utf8 bytes as part
> of that string.  Similarly, using any of these octal values in an re
> charclass will cause the re to be converted to utf8, and will match the
> corresponding unicode code point.  But when values in this range appear in
> an re outside a charclass there an inconsistency.  On an 8-bit character
> machine (if there aren't 256 or so parenthetical sub expressions in the re)
> they will match a two character sequence, but not the same utf8 sequence
> matched if they had instead appeared in a charclass.  I'm not sure what
> would happen on a 9-bit machine.  It might very well be what Glenn suggests,
> the corresponding 9 bits.
>
> Tom has pointed out that \777 is a reserved value in some contexts.

Oh? I missed that.

> It seems to me to be a bad idea to remove acceptance of octal numbers in
> re's.

Yes I think thats been well demonstrated.

> It seems like a good idea to add something to the language so one can
> express them unambiguously.  Even I with my limited knowledge of regcomp.c
> could do it easily (fools rush in...).

Sometimes being able to do something is simply not having the fear
that it might be too hard. :-)

> And it seems like an even better idea to handle them consistently.  I see
> two ways to do that 1) accept my patch; or

Thats pretty much a given. I just haven't had the time yet.

And well while it was being contentiously debated I wanted to wait and
see a bit. :-)

> 2) forbid or warn about the use
> of those larger than a single character in the machine architecture in both
> strings and re's, including char classes.

I need to think about this one.

> Perhaps I've forgotten something in this thread.  If so, I'm sorry.

Please don't be sorry. For me you a welcome breath of fresh air. It's
wonderful to have you on board.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About