On 11/28/2010 2:34 AM, demerphq wrote: > On 27 November 2010 23:53, Reverend Chip <rev.chip@gmail.com> wrote: >> On 11/27/2010 4:06 AM, Nicholas Clark wrote: >>> On Fri, Nov 26, 2010 at 01:03:20PM -0800, Reverend Chip wrote: >>>> You seriously equate Encode::_utf8_on() with, say, playing around with >>>> optrees using B? You seriously equate a bad pointer in an SV to a >>>> misplaced byte in a utf8 string? >>> Yes. Totally. >> There are some similarities, but since the ':utf8' layer just slaps the >> utf8 bit on whatever comes in, the situations are not identical. > To me *that* is the bug that you should be reviewing. False dilemma fallacy. >> It's >> obvious to me that since a regex can die of an assertion due to bad >> input data, then we might at least want to clue the user in about which >> regex is dying so he can guard it. > If a better error message was possible *without* having to validate > the utf8 string then I would say you are right. It's probably a matter of assert->croak, plus some cleanup. I'll look into it. > However if it means we have to validate the string every time we do a > utf8 operation then I would say you are wrong. Slippery slope fallacy. >> Since no one is chiming in to agree >> with me, I guess I'll just stop. I'm quite disappointed by the apparent >> lack of concern for the basic usability issues. I'm left with no option >> but to think of it as evolution in action. > I think this is unfair. We care about usability issues, however we > dont agree on the characterization of the *source* of the bug. That's an incorrect summary of the disagreement. We all know where the bug is coming from. We disagree on what the core should do, if anything, to detect, prevent, survive, and/or report it. My appeal to usability, in this case, is mostly about assert->croak, and my as yet unwritten followup would be about asset->warn instead. Since you agree with me about croak I think you're right, we're both concerned about usability. {argument from false equivalence elided} > However I would argue that the :utf8 not validating input before > marking the string as utf8 is probably a bug [...] Given the extant attitude toward crashing when the utf8 is invalid, I tend to agree. >> I actually use it (properly) in conjunction with utf8::valid to detect >> and repair double encoding, so I'm very happy it's available. If it >> weren't, I'd have to write it. > Do you mean Encode::_utf8_off()? Both. The basic logic is, more or less. attempt to downgrade if successful, then _utf8_on() the byte result; if utf8::valid then we're done, because we've fixed a double encoding. otherwise, _utf8_off() and upgrade again. Using the _utf8_* functions avoids useless data copies. Granted this is not 100.0% reliable, but it works for web site purposes. The need for this is based entirely on a situation where people were unforgivably sloppy with encoding, and cleaning the old data is both expensive and (as a result of this hack) unnecessary. In these circumstances, it's a handy function for us.Thread Previous | Thread Next