develooper Front page | perl.perl5.porters | Postings from December 2009

Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent

Thread Previous | Thread Next
karl williamson
December 11, 2009 11:20
Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent
Message ID:
Gerard Goossen wrote:
> What I am missing in the dicussion is that on average exists code
> would be improved by chaning the semantics, and thus instead of
> thinking about possibly breaking 20% of CPAN we are fixing 80% of
> If we want this to be the default at any time in the future, we should
> do it now, because I don't see how having another release cycle would
> change anything.

I'm thinking that if we make it not the default now, that it would give 
people a chance to switch to it if they want; and a chance for module 
authors to check their code.  If they don't, well, they did have a 
chance, as opposed to us springing it on them with no time for reaction.
> More specific about the failures caused by the changes:
> The pod stuff is breaking because it expect a non-breakable-space to
> be matched by \s, as far as I know it is about the only module
> expecting this behaviour (which is probably broken because it
> currently depends on the utf8-ness of the scalar). I did a similar
> change in Perl Kurila and what I remember is that only the pod module
> had problems with it. I'll check whether I can find the changes to the
> pod module, which make them work without using the "use legacy
> 'unicode8bit'".

How much of CPAN did you actually try on Kurila?

I actually did find the lines that needed changing in all the modules 
except Test::Harness.  They were in the wrap functions, and in some 
cases, another one as well.  I was starting to fix them there, but 
realized I didn't know enough about what their input character set 
domain was supposed to be.
> I am suprised at the failure of Test::Harness, if anything I would
> expect it to fix it, looking at ...\YAMList\ it uses \s to
> match space characters, but according to YAML a non-breaking-space
> isn't a space (and thus it would be part of 80% of CPAN which
> would be fixed by the change).
> Karl: could you find out why it fails? I suspect that there is
> something having some (unwanted) side effect (which probably isn't
> wrong or shouldn't have any effect on code, but might be easily
> prevented).

I actually don't feel I have the time to spend on this.  The test that 
failed talked about Unprintables, and the failure was with the no break 
> Another class of failures are those that depend on the current
> behaviour to test the internals, like the POSIX/t/time.t test, which
> uses the current behaviour to test that utf8-flag is not set, this is
> simply broken, and it should simply use utf8::is_utf8.
> Gerard Goossen

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About