develooper Front page | perl.perl5.porters | Postings from August 2010

Re: unicode_strings feature

Thread Previous
karl williamson
August 7, 2010 20:15
Re: unicode_strings feature
Message ID:
Eric Brine wrote:
> Hi,
> The documented purpose of unicode_strings is make these equivalent.
>> perl -le"my $s = chr(0xE1); utf8::upgrade($s); print $s =~ /[[:alpha:]]/
> ?1:0"
> 1
>> perl -le"use feature 'unicode_strings'; my $s = chr(0xE1); print $s =~
> /[[:alpha:]]/ ?1:0"
> 0
> What am I missing?
> Eric Brine
> Perl 5.12.1


=head2 the 'unicode_strings' feature

C<use feature 'unicode_strings'> tells the compiler to treat
all strings outside of C<use locale> and C<use bytes> as Unicode. It is
available starting with Perl 5.11.3.

See L<perlunicode/The "Unicode Bug"> for details.

That section in perlunicode.pod is too long too extract here, but it 
lists the 4 affected areas, including the one you mentioned, and says 
that the only area implemented is the one for changing the case uc(),  I thought I had submitted a patch to make that 5 areas, with 
more detail about things like [[:alpha:]], but it isn't in blead, nor my 
  work areas.  I don't know what happened there.

My main goal for 5.14 is to fix the Unicode bug completely.  I have a 
patch prepared to fix it for \s and \w; and am waiting on the regex 
modifiers decision before submitting both things in the same patch 
sequence, as the patch was submitted in time for 5.12, but was rejected 
for lack of the regex modifiers.

However, it is not clear that [[:alpha:]] will ever match chr(0xe1). 
This list needs to soon have a conversation about that, but I've held 
off starting it to avoid putting too many Unicode things out there at 
the same time.  I plan to outline the issues in a post as soon as there 
is a resolution to the regex modifiers. which I anticipate happening in 
a few days.

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About