2010/1/27 karl williamson <public@khwilliamson.com>:
> demerphq wrote:
>>
>> 2010/1/26 karl williamson <public@khwilliamson.com>:
>>>
>>> As a relative newcomer to this, I do say things here that reveal my
>>> ignorance, which some may mistake for stupidity. Sometimes I say things
>>> that are wrong and nobody corrects me, and I discover it after delving
>>> deeper in the code. I would appreciate any corrections people know
>>> about.
>>
>> Bah. You kick ass. And dont forget it. Even being willing to look at
>> this crap earns you a very large gold star.
>
> I actually wasn't looking for a compliment. I mean merely that I will say
> apparently stupid things on this list, and I'd rather get corrected than to
> have silence.
Likely that you hear silence not because you said something stupid but
just because very few people know the code.
> To sum up the rest of your post, you think the toker should again parse
> \N{}.
Correct.
> That was the part I was missing. I also suspect that changing the tokenizer
> like this would be something that shouldn't be a late adder to any release,
> much less a frozen one. If so, then maybe 5.12 should go out without this
> fixed?
Given the history involved I think that this one might be ok.
After all the toker did parse \N{} for years and years.
> And I agree that having the tokenizer do this requires new syntax. And,
> hence, the new syntax could not be deferred.
Right.
> Having regcomp.c do the translation allows us to not have new syntax, and it
> does solve Bug #56444. But it does not, by itself, solve the other one tied
> to the ticket: #62056. rt is down, but I can reconstruct the bug:
>
> use charnames ':full';
> my $x = 'foo';
> m/$x\N{LATIN CAPITAL LETTER A}/
>
> fails because (PL_hints & HINT_LOCALIZE_HH) is 0 by the time the regex is
> being compiled. That is, it appears that the fact that charnames is in
> effect is gone at execution time. I'm presuming that it is believed that
> this will become irrelevant if the toker does the parsing, including under
> all the eval situations. Yes? But isn't it also possible to make this flag
> available at run time as well? And would that be a lesser and safer change?
See then we would end up with bugs about how
use charnames ':full';
my $qr1= qr/\N{LATIN CAPITAL LETTER A}/;
$^H{charnames} = sub{"f"};
my $qr2= qr/\N{LATIN CAPITAL LETTER A}/;
$^H{charnames} = sub{"foo"};
my $x = 'foo';
print "matched $1" if ($x=~/($qr1|$qr2|\N{LATIN CAPITAL LETTER A})/;
Doesnt work as expected, and we might end up committing ourselves to
making it work in ways that we dont want to.
In other words I feel that doing this the "easy" way right now smacks
dangerously close to adding an infinite numbers of types of strings to
perl. Its bad enough we have the utf8 flag, but if we end up with the
"use charnames handler" string whatever it wont be a step forward.
>
> Having restructured the part of the tokenizer that handles \N a year ago,
> I'm quite capable of easily changing it to return the new syntax should we
> decide to go that way. (Note that \N also means something else now to
> regexes, so that would probably have to be incorporated). I think it would
> be better for someone else to make the changes that cause the tokenizer to
> become involved again.
When you say the tokenizer in this context you mean toke.c or in the
regex engine?
If you are comfortable with that as far as i recall its actually
pretty simple. There is a list of characters that are to be expanded
to literals or passed through. Anyway, we can/should be able to find
the commit that introduced this and more or less reverse it. In
theory. ;-)
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Previous
|
Thread Next