On 07/22/2014 10:14 PM, Karl Williamson wrote: > On 05/10/2014 03:49 PM, bulk88 wrote: >> Karl Williamson wrote: >>> It seems to me that by lowering the ratio so that greater than about >>> 15-20% non-text cause the file to be classified as binary, while >>> expanding the text characters by the 95 upper Latin1 printable >>> characters (except for \xFF) will give good results, better than the >>> existing. >> >> Why do we use percent cutoffs in the first place? Either it is >> printable/glyphable or its not. perlfunc does document the %s behavior, >> so I would guess ANY change to the algorithm would break backcompat for >> the few people willing to use such an unreliable algo. I would suggest >> to leave it alone as a backcompat/legacy/obsolete feature, or deprecate >> and remove -T/-B and tell people to use CPAN/something smarter for their >> specific purpose. >> >> Being purely printable doesn't mean a string/data is risk free but a >> fixed set of rules is better than a % "guess". >> >> http://www.blackhatlibrary.net/Shellcode/Null-free >> http://www.blackhatlibrary.net/Ascii_shellcode >> > > From what he has said privately, I think RJBS pretty much agrees with > this, and that breakage will likely come in the field rather than by > smoking CPAN, as it would depend on real-world data, not terribly likely > to be in the test files. > > This is a pity, as I have 20 year-old code that would benefit from it. I > do intend to fix the current broken UTF-8 handling, and add some > documentation about it. Now in blead with f13c8ddbfb6aa3d08ad6f1ea2a66babf3f782cac. I avoided any changes to the original semantics, except to fix obvious bugs.Thread Previous | Thread Next