On Sun, Apr 11, 2010 at 09:07:08PM -0700, Karl Williamson via RT wrote: > I'm preparing a patch for this bug, and I'm uncertain about the best way > to do it. > > First, the bug is caused by the code not realizing that when you have > two strings that independently may be in utf8 or not, that there are 4 > cases to take care of. I mention this because the error of only taking > care of 3 of the cases occurs in other places in the code as well. > > The code does not consider the possibility that the replacement string > could be in utf8 when the source/target string isn't. Thus > > $latin1 =~ s/latin1/utf8/; > > fails. The solution is to upgrade the variable to utf8. My dilemma is > whether to always do the upgrade when the replacement string is in utf8, > or to do it only if the match succeeds. The difference can lead to > different results later, as if there is no upgrade, the scalar's > characters in the 128-255 range will have different semantics than if > the upgrade takes place. > > I'm leaning towards doing the upgrade, as I think we can infer from the > replacement string being in utf8 that the programmer intended that the > string have Unicode semantics, even if it isn't in utf8. Therefore, > it's better to do the upgrade to force those semantics. > > Is there a contrary opinion? Yes! I think it could could just a validly be argued that the programmer only intended the utf8 upgrade for the cases that matched. Which makes us even. Then I think the tie-breaker is that we should try to be as conservative as possible and only upgrade when we need to. -- In economics, the exam questions are the same every year. They just change the answers.Thread Previous