# New Ticket Created by karl williamson # Please include the string: [perl #75574] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75574 > My recent accepted patch that added a synonym for a subroutine that I find much more informative, has led me to try doing the same for the other two names that I find especially confusing, and apparently they have been to others as well. I always thought the Sapir-Whorf hypothesis made a lot of sense, even though I was told when I studied it in college that it was discredited. (My daughter, who has a degree in linguistics, tells me that it is back in favor, and a quick look at wikipedia confirms that.) Anyway, I have found a few bugs so far in the code that have the same root cause: the failure to realize that when you have two string-like entities, that either one or both may be in UTF8, which leads to 4 possibilities always. Often the the code fails to take into account one of those possibilities. This bug is in regexec.c, and I wonder how prevalent it is there. In this file there is a pattern and a target to match against, and the 4 possibilities are always there. But the variable meaning the pattern is in UTF8 is 'UTF', and the variable meaning the target string is in UTF8 is 'do_utf8'. The bugs I've found stem from forgetting that the pattern can be in UTF8 without the variable being so, and the very name 'do_utf8' which applies only to the target seems to me to lead one down this incorrect path. It was an easy patch to change UTF to UTF_PATTERN and do_utf8 to utf8_target, and will help me remember as I scan the code, and hopefully others as well, to always be cognizant of the 4 possibilities.Thread Next