develooper Front page | perl.perl5.porters | Postings from December 2009

Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent

Thread Previous | Thread Next
karl williamson
December 13, 2009 10:57
Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent
Message ID:
karl williamson wrote:
> Gerard Goossen wrote:
>> What I am missing in the dicussion is that on average exists code
>> would be improved by chaning the semantics, and thus instead of
>> thinking about possibly breaking 20% of CPAN we are fixing 80% of
>> CPAN.
>> If we want this to be the default at any time in the future, we should
>> do it now, because I don't see how having another release cycle would
>> change anything.
> I'm thinking that if we make it not the default now, that it would give 
> people a chance to switch to it if they want; and a chance for module 
> authors to check their code.  If they don't, well, they did have a 
> chance, as opposed to us springing it on them with no time for reaction.
>> More specific about the failures caused by the changes:
>> The pod stuff is breaking because it expect a non-breakable-space to
>> be matched by \s, as far as I know it is about the only module
>> expecting this behaviour (which is probably broken because it
>> currently depends on the utf8-ness of the scalar). I did a similar
>> change in Perl Kurila and what I remember is that only the pod module
>> had problems with it. I'll check whether I can find the changes to the
>> pod module, which make them work without using the "use legacy
>> 'unicode8bit'".
> How much of CPAN did you actually try on Kurila?
> I actually did find the lines that needed changing in all the modules 
> except Test::Harness.  They were in the wrap functions, and in some 
> cases, another one as well.  I was starting to fix them there, but 
> realized I didn't know enough about what their input character set 
> domain was supposed to be.
>> I am suprised at the failure of Test::Harness, if anything I would
>> expect it to fix it, looking at ...\YAMList\ it uses \s to
>> match space characters, but according to YAML a non-breaking-space
>> isn't a space (and thus it would be part of 80% of CPAN which
>> would be fixed by the change).
>> Karl: could you find out why it fails? I suspect that there is
>> something having some (unwanted) side effect (which probably isn't
>> wrong or shouldn't have any effect on code, but might be easily
>> prevented).
> I actually don't feel I have the time to spend on this.  The test that 
> failed talked about Unprintables, and the failure was with the no break 
> space.

I had some more insight about this.  I believe it is a bug in the test. 
  I changed the order so that the no break space wasn't first on the 
line, and it passed.  There is probably a s/^\s+// line in the module, 
and it is reasonable for that to strip off a leading no-break space. 
But the test assumes that it shouldn't.

>> Another class of failures are those that depend on the current
>> behaviour to test the internals, like the POSIX/t/time.t test, which
>> uses the current behaviour to test that utf8-flag is not set, this is
>> simply broken, and it should simply use utf8::is_utf8.
>> Gerard Goossen

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About