develooper Front page | perl.perl5.porters | Postings from August 2010

qr stringification: why are xism always present? I'm worried aboutbackward compatibility

Thread Next
karl williamson
August 1, 2010 10:24
qr stringification: why are xism always present? I'm worried aboutbackward compatibility
Message ID:
I've been working on adding the regex modifiers for 
unicode/locale/traditional semantics; and am almost ready.

However, the way I've implemented it breaks backward compatibility with 
things that rely on the current stringification of regexes.  I had to 
fix several .t's that did this (including one in cpan).  I have no idea 
how prevalent this reliance is.

The reason is that the code always outputs the modifier, even for the 
the existing semantics, so the stringification is permanently different 
from before.  That got me to thinking that it should be possible to omit 
the modifier from the stringification unless it is different from the 
default.  Thus backward compatibility would not be broken, although it 
is a dangerous thing for modules to be relying on knowing all the 
possible modifiers, and someone could pass them a regex compiled with a 
new modifier that they don't know how to deal with, not just these, but 
any new ones.

But there must be some reason that xism are always shown in the 
stringification, whether they are in effect or not in effect.  But I 
can't think what it might be.  It would have been simpler for the 
original design to only output them in the stringification when 
different from the default.  So there must be a reason why the plus or 
minus of them is always output.  Does that reason apply to these new 

Perhaps an example will clarify things.  Currently if you say qr/foo/, 
the stringification is (?-xism:foo)
If you instead say qr/foo/xm, the stringification is (?mx-is:foo).

My working plan is to have modifiers Cl for locale, Cu for unicode, and 
Cd for dual.  'C' stands for character set.  I've also toyed with 'S' 
for semantics.  Anyway, dual is because it behaves sometimes like the 
native character set, and sometimes like unicode.

But anyway, the new stringification would be (?Cd-xism:foo) or 
(?Cdmx-is:foo).  Is there a reason that the Cd needs to be output?  If 
not, why do the 'xism' always have to be output?

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About