develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #117355] [lu]cfirst don't respect 'use bytes'

Thread Previous | Thread Next
From:
Victor Efimov via RT
Date:
August 16, 2013 00:45
Subject:
[perl #117355] [lu]cfirst don't respect 'use bytes'
Message ID:
rt-3.6.HEAD-2552-1376613908-521.117355-15-0@perl.org
On Thu Aug 15 15:46:39 2013, aristotle wrote:

> That is really the last remnant (I think) of The Unicode Bug.

More precisely http://perldoc.perl.org/perlunicode.html "When Unicode
Does Not Happen"

> And however useful it may be while the bug persists, a workaround is
all it is. It *isn’t* a legitimately good use case for bytes.pm.

Yes, workarounds needed until issue is fixed.

And probably until all CPAN code, which misuse UTF-8, is fixed. I found
several examples (some of them never going to be fixed - authors refuse
to do that).

https://rt.cpan.org/Public/Bug/Display.html?id=87863
https://rt.cpan.org/Public/Bug/Display.html?id=87807
https://rt.cpan.org/Public/Bug/Display.html?id=30271
https://github.com/akarelas/xml-myxml/issues/2


> I have no idea what the concept of assertions or that of unit tests
has to do with the internal representation of strings

> OK, to cut a long story short, the line is
> ...

Exactly. Your explanation is correct.

> This is not a bug, though it certainly is suboptimal.

Of course I agree that this is a feature, not a bug.
Point was it's suboptimal.
That is why I need to check it in assertions and unit tests ("unit
tests" is opposite to what is said in bytes.pm "use only for debugging
purposes")

> encoding::warnings

Seems a great module. At least great idea. However for my case it does
not work or I misunderstood its usage (it does not catch error and
actually silently fixes the "Unicode bug" with filenames in perl - i.e.
with this pragma program behaves differently)

=======================
use Encode;
use utf8;
use strict;
use warnings;
my $u = "\x{442}\x{435}\x{441}\x{442}"; # same as "тест"
my $bin = "\xf1\xf2\xf3";
my $ascii = "x";
my ($ascii_u, undef) = split(/ /, "$ascii $u");

print "original bin length:\t";
print length($bin) . "\t" . bytes::length($bin) ."\n";

my $bin_a = do {
 use encoding::warnings 'FATAL';
 $bin.$ascii;
};

print "bin_a length:\t";
print length($bin_a) . "\t" . bytes::length($bin_a) ."\n";

my $bin_u = $bin.$ascii_u; # THIS LINE CONTAINS A BUG

die unless $bin_u eq $bin_a;
print "bin_u and bin_a are same!\n";


use Devel::Peek;
Dump $bin_a; Dump $bin_u;

open my $f, ">", "$bin_u.tmp";
binmode $f;
syswrite $f, "TEST";
close $f;

open $f, "<", "$bin_a.tmp" or die "file not found $!";
=======================

> because if you try to catch it manually, you will miss places where
you would need to put checks
> Also, if you *already know* (some of) the places

that's an idea of unit tests - catch bugs in known places.

> utf8::downgrade($bin, 1);
> utf8::downgrade($ascii_u, 1);
> my $bin_u = $bin.$ascii_u; # THIS LINE NO LONGER CONTAINS A BUG

Yes. This is a fix for the bug. Now I need to unit test the fix with
bytes::length or encoding::warnings. (i.e. a practice to write tests
after bug found)

> This is exactly the same bug as in your first comment on this issue

Yes, I reposted a bit re-worked example when answered another comment.



---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=117355

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About