develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Simple things should be simple (was: Re: Smack!)

Thread Previous
From:
Abigail
Date:
April 20, 2007 08:33
Subject:
Re: Simple things should be simple (was: Re: Smack!)
Message ID:
20070420153306.GE9598@abigail.nl
On Fri, Apr 20, 2007 at 09:23:19AM -0600, Tom Christiansen wrote:
> 
> I don't really think it's "fair" that uc/lc/lcfirst/ucfirst/regexp
> classes work fine on all characters but those 128..255.  That is,
> 0..127 work fine, and 256..inf work fine, but not the middle ones.


I think most people on p5p agree.

The problem lies waaaaaaaaaaay back, when it was decided that, in the
absence of a location in effect, none of the characters in the 128..255
range ought to be considered 'word' characters. So they don't match \w,
and are uneffected by uc/lc.

Then we got Unicode and UTF-8. But we also don't want to break existing
code. So we continue to let the "wordness" of the characters of 128..255
for non-UTF-8 strings to be determined by the locale.


I wouldn't mind having a feature (or a pragma), that says "use Unicode
semantics, regardless how the string is encoded".



Abigail

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About