develooper Front page | perl.perl5.porters | Postings from December 2009

Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent

Thread Previous | Thread Next
From:
karl williamson
Date:
December 13, 2009 10:57
Subject:
Re: PATCH: partial [perl #58182]: regex case-sensitive matching nowutf8ness independent
Message ID:
4B2538ED.5050002@khwilliamson.com
karl williamson wrote:
> Gerard Goossen wrote:
>> What I am missing in the dicussion is that on average exists code
>> would be improved by chaning the semantics, and thus instead of
>> thinking about possibly breaking 20% of CPAN we are fixing 80% of
>> CPAN.
>>
>> If we want this to be the default at any time in the future, we should
>> do it now, because I don't see how having another release cycle would
>> change anything.
> 
> I'm thinking that if we make it not the default now, that it would give 
> people a chance to switch to it if they want; and a chance for module 
> authors to check their code.  If they don't, well, they did have a 
> chance, as opposed to us springing it on them with no time for reaction.
>>
>> More specific about the failures caused by the changes:
>>
>> The pod stuff is breaking because it expect a non-breakable-space to
>> be matched by \s, as far as I know it is about the only module
>> expecting this behaviour (which is probably broken because it
>> currently depends on the utf8-ness of the scalar). I did a similar
>> change in Perl Kurila and what I remember is that only the pod module
>> had problems with it. I'll check whether I can find the changes to the
>> pod module, which make them work without using the "use legacy
>> 'unicode8bit'".
> 
> How much of CPAN did you actually try on Kurila?
> 
> I actually did find the lines that needed changing in all the modules 
> except Test::Harness.  They were in the wrap functions, and in some 
> cases, another one as well.  I was starting to fix them there, but 
> realized I didn't know enough about what their input character set 
> domain was supposed to be.
>>
>> I am suprised at the failure of Test::Harness, if anything I would
>> expect it to fix it, looking at ...\YAMList\Reader.pm it uses \s to
>> match space characters, but according to YAML a non-breaking-space
>> isn't a space (and thus it would be part of 80% of CPAN which
>> would be fixed by the change).
>>
>> Karl: could you find out why it fails? I suspect that there is
>> something having some (unwanted) side effect (which probably isn't
>> wrong or shouldn't have any effect on code, but might be easily
>> prevented).
> 
> I actually don't feel I have the time to spend on this.  The test that 
> failed talked about Unprintables, and the failure was with the no break 
> space.
>>

I had some more insight about this.  I believe it is a bug in the test. 
  I changed the order so that the no break space wasn't first on the 
line, and it passed.  There is probably a s/^\s+// line in the module, 
and it is reasonable for that to strip off a leading no-break space. 
But the test assumes that it shouldn't.

>> Another class of failures are those that depend on the current
>> behaviour to test the internals, like the POSIX/t/time.t test, which
>> uses the current behaviour to test that utf8-flag is not set, this is
>> simply broken, and it should simply use utf8::is_utf8.
>>
>> Gerard Goossen
>>
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About