On Mon, 30 Jan 2017 18:24:38 -0800, khw wrote: > It so happens that the intermediate form is unchanged from the input > for > ASCII literal characters that don't have ranges, so this works > > tr'abc'def' > > but ranges don't work > > tr'a-z'A-Z' # translates only the three chars 'a', '-', 'z' > > I haven't investigated \t, etc. > > My guess is that it has been broken from the beginning, and it's > amazing > to me that it has gone undiscovered for so long, requiring a fuzzer, > with the red herring of UTF-8 being involved. > > One possibility to fix this is to simply prohibit single-quotish > behavior with tr, by putting in a check somewhere in the parsing. It > clearly isn't something that is affecting many, if any, people. > > Otherwise, the portion of scan_const that deals with this would have > to > be pulled out into a separate function, called as well from whatever > part of toke deals with single quotish. I believe that > > tr '\xDF'\xFE' > > should not be evaluated as double-quotish, for example. > > Since I don't have much knowledge of the parser's overall operation, > suggestions are welcome. Or volunteers. We'd need to modify S_tokeq() to handle ranges in single-quotish tr'''. Possibly it could check for broken unicode to avoid #130675. I don't think tr''' should support escapes beyond \' Tony --- via perlbug: queue: perl5 status: new https://rt.perl.org/Ticket/Display.html?id=130679Thread Next