develooper Front page | perl.perl5.porters | Postings from July 2014

Re: RFC: Making -B and -T work better on 8-bit encodings

Thread Previous | Thread Next
From:
Karl Williamson
Date:
July 23, 2014 04:14
Subject:
Re: RFC: Making -B and -T work better on 8-bit encodings
Message ID:
53CF36A8.40108@khwilliamson.com
On 05/10/2014 03:49 PM, bulk88 wrote:
> Karl Williamson wrote:
>> It seems to me that by lowering the ratio so that greater than about
>> 15-20% non-text cause the file to be classified as binary, while
>> expanding the text characters by the 95 upper Latin1 printable
>> characters (except for \xFF) will give good results, better than the
>> existing.
>
> Why do we use percent cutoffs in the first place? Either it is
> printable/glyphable or its not. perlfunc does document the %s behavior,
> so I would guess ANY change to the algorithm would break backcompat for
> the few people willing to use such an unreliable algo. I would suggest
> to leave it alone as a backcompat/legacy/obsolete feature, or deprecate
> and remove -T/-B and tell people to use CPAN/something smarter for their
> specific purpose.
>
> Being purely printable doesn't mean a string/data is risk free but a
> fixed set of rules is better than a % "guess".
>
> http://www.blackhatlibrary.net/Shellcode/Null-free
> http://www.blackhatlibrary.net/Ascii_shellcode
>

 From what he has said privately, I think RJBS pretty much agrees with 
this, and that breakage will likely come in the field rather than by 
smoking CPAN, as it would depend on real-world data, not terribly likely 
to be in the test files.

This is a pity, as I have 20 year-old code that would benefit from it. 
I do intend to fix the current broken UTF-8 handling, and add some 
documentation about it.

The 'file' command used to work, when I knew about it, by examining a 
particular location in the text segment of files to see what its 'magic 
number' was, a registry of which was kept somewhere, which would 
indicate the type of file, like modern-day pdf, etc.  Something that 
didn't look like it was a magic number would start a guessing process, 
like looking for troff commands.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About