Front page | perl.perl5.porters |
Postings from January 2017
Re: Does the range operator still have the Unicode Bug?
Thread Previous
|
Thread Next
From:
Aaron Crane
Date:
January 3, 2017 18:59
Subject:
Re: Does the range operator still have the Unicode Bug?
Message ID:
CACmk_tseENLOTp1RUAD7Fxf+r+1Wqs3h8J0dQ8U8Cx3mX3bA1w@mail.gmail.com
Karl Williamson <public@khwilliamson.com> wrote:
> The documentation needs to be patched. Besides the obvious ones, I found
> these: feature.pm, perlunicode, perluniintro (minor).
Thanks. I'd considered editing perlop, but couldn't find a way to
describe the change that didn't involve spending a lot of words
describing behaviour that I consider extremely confusing. The existing
perlop documentation is completely correct for the fixed behaviour, in
that it describes ".." in terms of string length and magical
increment.
I've pushed a rebased and updated version of that
smoke-me/arc/unicode-range-operator branch; the documentation changes
are also shown below. If there are any other documentation changes
that you think would be useful, please let me know. I'm hoping to get
this merged before the user-visible changes freeze.
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 152c34bbe2..33e52b31b3 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1814,6 +1814,16 @@ Prior to that, or outside its scope, no code
points above 127 are quoted
in UTF-8 encoded strings, but in byte encoded strings, code points
between 128-255 are always quoted.
+=item *
+
+In the C<..> or L<range|perlop/Range Operators> operator.
+
+Starting in Perl 5.26.0, the range operator on strings treats their lengths
+consistently within the scope of C<unicode_strings>. Prior to that, or
+outside its scope, it could produce strings whose length in characters
+exceeded that of the right-hand side, where the right-hand side took up more
+bytes than the correct range endpoint.
+
=back
You can see from the above that the effect of C<unicode_strings>
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod
index beccd3c6a4..5b571fbbc1 100644
--- a/pod/perluniintro.pod
+++ b/pod/perluniintro.pod
@@ -151,9 +151,13 @@ serious Unicode work. The maintenance release
5.6.1 fixed many of the
problems of the initial Unicode implementation, but for example
regular expressions still do not work with Unicode in 5.6.1.
Perl v5.14.0 is the first release where Unicode support is
-(almost) seamlessly integrable without some gotchas (the exception being
-some differences in L<quotemeta|perlfunc/quotemeta>, and that is fixed
-starting in Perl 5.16.0). To enable this
+(almost) seamlessly integrable without some gotchas. (There are two
+exceptions. Firstly, some differences in L<quotemeta|perlfunc/quotemeta>
+were fixed starting in Perl 5.16.0. Secondly, some differences in
+L<the range operator|perlop/Range Operators> were fixed starting in
+Perl 5.26.0.)
+
+To enable this
seamless support, you should C<use feature 'unicode_strings'> (which is
automatically selected if you C<use 5.012> or higher). See L<feature>.
(5.14 also fixes a number of bugs and departures from the Unicode
diff --git a/regen/feature.pl b/regen/feature.pl
index 7a5671276e..66fc017da6 100755
--- a/regen/feature.pl
+++ b/regen/feature.pl
@@ -367,7 +367,7 @@ read_only_bottom_close_and_rename($h);
__END__
package feature;
-our $VERSION = '1.45';
+our $VERSION = '1.46';
FEATURES
@@ -484,7 +484,9 @@ potentially using Unicode in your program, the
C<use feature 'unicode_strings'> subpragma is B<strongly> recommended.
This feature is available starting with Perl 5.12; was almost fully
-implemented in Perl 5.14; and extended in Perl 5.16 to cover C<quotemeta>.
+implemented in Perl 5.14; and extended in Perl 5.16 to cover C<quotemeta>;
+and extended further in Perl 5.26 to cover L<the range
+operator|perlop/Range Operators>.
=head2 The 'unicode_eval' and 'evalbytes' features
--
Aaron Crane ** http://aaroncrane.co.uk/
Thread Previous
|
Thread Next