develooper Front page | perl.perl5.porters | Postings from May 2010

RFC: interactions between "use bytes", "use locale", and "use feature'unicode_strings'

Thread Next
From:
karl williamson
Date:
May 10, 2010 12:04
Subject:
RFC: interactions between "use bytes", "use locale", and "use feature'unicode_strings'
Message ID:
4BE858A6.5000200@khwilliamson.com
I am waiting for blead to reopen before I submit a patch for extending 
feature unicode_strings to matching \s, and \w.  In documenting it, it 
occurred to me that the existing implementation is wrong, along with the 
existing interaction between use bytes and use locale.

To refresh your memory, "use feature 'unicode_strings'" is supposed to 
mean that even non-utf8 data is to be considered to have Unicode 
semantics.  In practice, this only affects the characters from 128-255.

It is new in 5.12, and is implemented there only on functions that 
change case, such as ucfirst(), and s/.../\L...\E/.

The implementation is that it is always subservient to 'use bytes' and 
'use locale'.  Thus, in
	use bytes;
	code 1
	{
	   use feature "unicode_strings"
	   code 2
	}
         code 3

byte semantics is used throughout.  It seems to me that instead unicode 
semantics should be used in code 2; byte semantics in codes 1 and 3.

Similarly with locale; unicode_strings is subservient to that, and I 
think they should behave instead as I just suggested for bytes.

And, I think that bytes and locale should similarly override each other, 
which they don't currently.

I believe that the implementation of hint_bits must currently stack it, 
so that when an interior block is popped, the value for the outer block 
is automatically restored.

Any comments?

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About