develooper Front page | perl.perl5.porters | Postings from August 2014

Re: RFC: Making -B and -T work better on 8-bit encodings

Thread Previous | Thread Next
Karl Williamson
August 21, 2014 19:26
Re: RFC: Making -B and -T work better on 8-bit encodings
Message ID:
On 07/22/2014 10:14 PM, Karl Williamson wrote:
> On 05/10/2014 03:49 PM, bulk88 wrote:
>> Karl Williamson wrote:
>>> It seems to me that by lowering the ratio so that greater than about
>>> 15-20% non-text cause the file to be classified as binary, while
>>> expanding the text characters by the 95 upper Latin1 printable
>>> characters (except for \xFF) will give good results, better than the
>>> existing.
>> Why do we use percent cutoffs in the first place? Either it is
>> printable/glyphable or its not. perlfunc does document the %s behavior,
>> so I would guess ANY change to the algorithm would break backcompat for
>> the few people willing to use such an unreliable algo. I would suggest
>> to leave it alone as a backcompat/legacy/obsolete feature, or deprecate
>> and remove -T/-B and tell people to use CPAN/something smarter for their
>> specific purpose.
>> Being purely printable doesn't mean a string/data is risk free but a
>> fixed set of rules is better than a % "guess".
>  From what he has said privately, I think RJBS pretty much agrees with
> this, and that breakage will likely come in the field rather than by
> smoking CPAN, as it would depend on real-world data, not terribly likely
> to be in the test files.
> This is a pity, as I have 20 year-old code that would benefit from it. I
> do intend to fix the current broken UTF-8 handling, and add some
> documentation about it.

Now in blead with f13c8ddbfb6aa3d08ad6f1ea2a66babf3f782cac.
I avoided any changes to the original semantics, except to fix obvious bugs.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About