* Karl Williamson <public@khwilliamson.com> [2014-05-02T16:28:48] > The -B and -T file test operators don't really work well for non-ASCII text > files, except perhaps under 'use locale' > > I'm proposing to change them in several ways early in 5.21 to make sure that > it doesn't adversely affect existing code before we decide for 5.22. In general, these changes look like an improvement to me. I'd want to see more opinions, if possible, as I don't use -T or -B in my usual work. > The current UTF8 handling is haphazard, and I think suboptimal. The patch > changes things to see if the entire block is ASCII. If not, it then looks > to see if it is one long UTF-8 string. The odds of something that passes > that test for 512 bytes not being UTF-8 are vanishingly small. Are we concerned about ending in the middle of a multibyte-sequence? > I also think that Vertical Tab and Form Feed are so infrequent that they > should be counted as non-text (currently VT is non-text, but FF is) Seems reasonable to me. > It represents a y with diaeresis in Latin1, which I believe occurs in > modern French in only a couple of place names, and is not used in the other > languages that Latin1 is designed for. Also in the name of once-excellent Seattle hard rock band Queensrÿche. Let's not forget them! > So I would change the patch to classify \xFF as a control. Works for me, despite my love for the aforementioned band's early work. :) -- rjbsThread Previous | Thread Next