On 11/20/2016 08:20 AM, Aaron Crane wrote: > Sawyer X <xsawyerx@gmail.com> wrote: >> On 10/30/2016 07:10 PM, Aristotle Pagaltzis wrote: >>> I would prefer to see this just fixed, for everyone, with cleaner code. >>> And it’s very *likely* that that can be done… just not *known*. A cycle >>> or two with warnings would give us data to calibrate the guess. >> >> Again, I'm not necessarily against that. I'm trying to add more >> considerations here. Perhaps the feature is the right place for it, >> using "unicode_strings". > > On the assumption that a concrete change is easier to reason about > than the abstract situation, I attach a proposed patch for the Unicode > Bug in the range operator. > > The patch itself is fairly straightforward; its guts look like this: > > --- a/pp_ctl.c > +++ b/pp_ctl.c > @@ -1222,6 +1222,8 @@ PP(pp_flop) > const char * const tmps = SvPV_nomg_const(right, len); > > SV *sv = newSVpvn_flags(lpv, llen, SvUTF8(left)|SVs_TEMP); > + if (DO_UTF8(right) && IN_UNI_8_BIT) > + len = sv_len_utf8_nomg(right); > while (!SvNIOKp(sv) && SvCUR(sv) <= len) { > XPUSHs(sv); > if (strEQ(SvPVX_const(sv),tmps)) > > (Except twice, because "foreach ($x .. $y)" has an independent > implementation that takes constant memory.) > > That is, this change makes stringy $x..$y honour the unicode_strings > feature, without any warning. > > FWIW, my own view is that this change is simply a bugfix for ranges > under the unicode_strings feature, and that the current behaviour is > so bizarre and unpredictable that no warning is necessary. (Or even > entirely useful, since we can't distinguish between code that wants > the current behaviour (but neglected to utf8::decode the RHS) and code > that's been updated to take advantage of the new behaviour.) > As I believe it has been pointed out before, the use of that feature implies that the user wants proper handling of unicode strings. That is why in earlier releases, it was enhanced to include more things, like quotemeta as they were unearthed, instead of creating extra features. Thus, treating this as a bug fix follows the existing paradigm that we followed.Thread Previous | Thread Next