develooper Front page | perl.perl5.porters | Postings from August 2010

Re: qr stringification: why are xism always present? I'm worriedabout backward compatibility

Thread Previous | Thread Next
karl williamson
August 2, 2010 17:09
Re: qr stringification: why are xism always present? I'm worriedabout backward compatibility
Message ID:
demerphq wrote:
> On 1 August 2010 19:24, karl williamson <> wrote:
>> But anyway, the new stringification would be (?Cd-xism:foo) or
>> (?Cdmx-is:foo).  Is there a reason that the Cd needs to be output?  If not,
>> why do the 'xism' always have to be output?
> A short answer to why the /xism/ part has to be output:
> the (?...) fragment should be sufficient /alone/ to /exactly/ specify
> how the snippet is to match, otherwise it cannot be safely embedded
> into other patterns without the meaning changing.
> However, /p does NOT always get output as it cannot be disabled, it
> affects the behavior of the match buffers, not the semantics of the
> pattern so it need only be present /somewhere/ in the pattern.
> demerphq@gemini:~$ perl -le'print qr/foo/p'
> (?p-xism:foo)
> demerphq@gemini:~$ perl -le'print qr/foo/pmsix'
> (?pmsix:foo)
> Id like to call your attention to a little problem that caused test
> failures when I tried changing it when I did the /p stuff:
> yorton@mc02ppcapp-03:~$ perl -le'print qr/foo/msix'
> (?msix:foo)
> yorton@mc02ppcapp-03:~$ perl -le'print qr/foo/'
> (?-xism:foo)
> Notice the order of the flags is mirrored. If you change this it /will/ break.

I have noticed that mirroring effect.  I suspect I've found all the 
places that would break in the core already, just by adding the 'Cd'. 
Using a flag that says use the compiled-in defaults would avoid this.
> The /p modifier is always placed at the front. I dont think you want
> to follow this with a two letter modifier.
> I personally would not like to see:
> qr/foo/mxCd
> be turned into
> qr/(?Cdmx:...)/
> I would want to see:
> qr/(?mxCd:...)/

Good point that I had not considered.
> I have to say, in all the discussion about what modifers to use, this
> aspects of choosing two digit modifiers seems like it hasn't been
> completely thought through..
> With one letter modifers one can use a charclass to determine
> validity. With two letter modifiers order becomes important and one
> must use more complex validation mechanisms.
> Also, the implementation would probably be considerably simpler with
> one-letter modfiers.
> I think perhaps we might end up regretting introducing two letter
> modifiers. Especially as there is prior art for using /U and things
> like that elsewhere.
> This isnt just a bike-shed argument there /are/ subtle issues involved
> here. Using a lot of /(?:...)/ has a measurable effect on the speed of
> parsing patterns. If we introduce more complexity into the process we
> will make this problem worse.
> Im sorry to bring it up tho.

I agree that it isn't a bike shed argument.  I think that user interface 
issues should be held to a higher standard than internal things.  It is 
important that we get a decent user interface.

I'm actually agnostic on whether we should have one or two characters 
here. but two seemed to be the consensus before.  I made sure that the 
lower-case sub-modifiers I used weren't already in use, so that before 
5.14 ships we could pull out the uppercase prefix, or after it ships, we 
could make it optional, promoting them to regular modifiers.

As far as implementation, I've already done it, and was surprised at how 
little code it was.  So that really isn't an issue.  It's what's the 
best user interface.  I do want to move forward on this, having been 
stymied in the past several attempts.  I'll work on a post summarizing 
the options as I see them, for another round of comments.  In response 
to that, I would hope you would describe the prior art you mention.

But I do have a related implementation question.  perlreapi.pod 
describes how one can plug-in another regex handler.  I believe that 
'use re debug' is such a plug-in.  What I don't see is how the writer of 
some other similar plug-in gets access to the various flag #defines, 
such as RXf_PMf_LOCALE, which is listed in the pod.  I've scanned the XS 
documentation, and don't see how such a module gets access to the flag 
#defines used in a general function.  Can they use any perl header?  If 
so, we have lots more backward compatibility worries than I thought. 
The only place that I see some of these exported is in defsubs.h, but I 
think that is only for the B module, but I'm pretty clueless about that 
whole area of Perl.  Any answers, advice or documentation pointers would 
be appreciated.

I care about this because I would like to get rid of some #defines, but 
don't know which are really externally visible.  For example, the 
semantics setting of a regex is mutually exclusive, therefore we don't 
need a bit for every possibility, locale, unicode, native, etc.  We 
could store the information more compactly in a set of bits that are an 
enum, and we are running out of available bits in the word.   But since 
RXf_PMf_LOCALE is listed in perlreapi, I don't think I can change that, 
even if I don't understand how anyone could gain access to its value. 
And if they hard-code the current value for it, then they could 
hard-code any other value, and so any moving around of bits would 
disrupt them.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About