develooper Front page | perl.perl5.porters | Postings from July 2014

Re: RFC: Making -B and -T work better on 8-bit encodings

Thread Previous | Thread Next
Karl Williamson
July 23, 2014 04:14
Re: RFC: Making -B and -T work better on 8-bit encodings
Message ID:
On 05/10/2014 03:49 PM, bulk88 wrote:
> Karl Williamson wrote:
>> It seems to me that by lowering the ratio so that greater than about
>> 15-20% non-text cause the file to be classified as binary, while
>> expanding the text characters by the 95 upper Latin1 printable
>> characters (except for \xFF) will give good results, better than the
>> existing.
> Why do we use percent cutoffs in the first place? Either it is
> printable/glyphable or its not. perlfunc does document the %s behavior,
> so I would guess ANY change to the algorithm would break backcompat for
> the few people willing to use such an unreliable algo. I would suggest
> to leave it alone as a backcompat/legacy/obsolete feature, or deprecate
> and remove -T/-B and tell people to use CPAN/something smarter for their
> specific purpose.
> Being purely printable doesn't mean a string/data is risk free but a
> fixed set of rules is better than a % "guess".

 From what he has said privately, I think RJBS pretty much agrees with 
this, and that breakage will likely come in the field rather than by 
smoking CPAN, as it would depend on real-world data, not terribly likely 
to be in the test files.

This is a pity, as I have 20 year-old code that would benefit from it. 
I do intend to fix the current broken UTF-8 handling, and add some 
documentation about it.

The 'file' command used to work, when I knew about it, by examining a 
particular location in the text segment of files to see what its 'magic 
number' was, a registry of which was kept somewhere, which would 
indicate the type of file, like modern-day pdf, etc.  Something that 
didn't look like it was a magic number would start a guessing process, 
like looking for troff commands.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About