develooper Front page | perl.perl5.porters | Postings from November 2016

Re: Does the range operator still have the Unicode Bug?

Thread Previous | Thread Next
Aaron Crane
November 20, 2016 15:20
Re: Does the range operator still have the Unicode Bug?
Message ID:
Sawyer X <> wrote:
> On 10/30/2016 07:10 PM, Aristotle Pagaltzis wrote:
>> I would prefer to see this just fixed, for everyone, with cleaner code.
>> And it’s very *likely* that that can be done… just not *known*. A cycle
>> or two with warnings would give us data to calibrate the guess.
> Again, I'm not necessarily against that. I'm trying to add more
> considerations here. Perhaps the feature is the right place for it,
> using "unicode_strings".

On the assumption that a concrete change is easier to reason about
than the abstract situation, I attach a proposed patch for the Unicode
Bug in the range operator.

The patch itself is fairly straightforward; its guts look like this:

--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -1222,6 +1222,8 @@ PP(pp_flop)
            const char * const tmps = SvPV_nomg_const(right, len);

            SV *sv = newSVpvn_flags(lpv, llen, SvUTF8(left)|SVs_TEMP);
+            if (DO_UTF8(right) && IN_UNI_8_BIT)
+                len = sv_len_utf8_nomg(right);
            while (!SvNIOKp(sv) && SvCUR(sv) <= len) {
                if (strEQ(SvPVX_const(sv),tmps))

(Except twice, because "foreach ($x .. $y)" has an independent
implementation that takes constant memory.)

That is, this change makes stringy $x..$y honour the unicode_strings
feature, without any warning.

FWIW, my own view is that this change is simply a bugfix for ranges
under the unicode_strings feature, and that the current behaviour is
so bizarre and unpredictable that no warning is necessary. (Or even
entirely useful, since we can't distinguish between code that wants
the current behaviour (but neglected to utf8::decode the RHS) and code
that's been updated to take advantage of the new behaviour.)

Aaron Crane **

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About