develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #117355] [lu]cfirst don't respect 'use bytes'

Thread Previous | Thread Next
Victor Efimov via RT
August 16, 2013 00:45
[perl #117355] [lu]cfirst don't respect 'use bytes'
Message ID:
On Thu Aug 15 15:46:39 2013, aristotle wrote:

> That is really the last remnant (I think) of The Unicode Bug.

More precisely "When Unicode
Does Not Happen"

> And however useful it may be while the bug persists, a workaround is
all it is. It *isn’t* a legitimately good use case for

Yes, workarounds needed until issue is fixed.

And probably until all CPAN code, which misuse UTF-8, is fixed. I found
several examples (some of them never going to be fixed - authors refuse
to do that).

> I have no idea what the concept of assertions or that of unit tests
has to do with the internal representation of strings

> OK, to cut a long story short, the line is
> ...

Exactly. Your explanation is correct.

> This is not a bug, though it certainly is suboptimal.

Of course I agree that this is a feature, not a bug.
Point was it's suboptimal.
That is why I need to check it in assertions and unit tests ("unit
tests" is opposite to what is said in "use only for debugging

> encoding::warnings

Seems a great module. At least great idea. However for my case it does
not work or I misunderstood its usage (it does not catch error and
actually silently fixes the "Unicode bug" with filenames in perl - i.e.
with this pragma program behaves differently)

use Encode;
use utf8;
use strict;
use warnings;
my $u = "\x{442}\x{435}\x{441}\x{442}"; # same as "тест"
my $bin = "\xf1\xf2\xf3";
my $ascii = "x";
my ($ascii_u, undef) = split(/ /, "$ascii $u");

print "original bin length:\t";
print length($bin) . "\t" . bytes::length($bin) ."\n";

my $bin_a = do {
 use encoding::warnings 'FATAL';

print "bin_a length:\t";
print length($bin_a) . "\t" . bytes::length($bin_a) ."\n";

my $bin_u = $bin.$ascii_u; # THIS LINE CONTAINS A BUG

die unless $bin_u eq $bin_a;
print "bin_u and bin_a are same!\n";

use Devel::Peek;
Dump $bin_a; Dump $bin_u;

open my $f, ">", "$bin_u.tmp";
binmode $f;
syswrite $f, "TEST";
close $f;

open $f, "<", "$bin_a.tmp" or die "file not found $!";

> because if you try to catch it manually, you will miss places where
you would need to put checks
> Also, if you *already know* (some of) the places

that's an idea of unit tests - catch bugs in known places.

> utf8::downgrade($bin, 1);
> utf8::downgrade($ascii_u, 1);
> my $bin_u = $bin.$ascii_u; # THIS LINE NO LONGER CONTAINS A BUG

Yes. This is a fix for the bug. Now I need to unit test the fix with
bytes::length or encoding::warnings. (i.e. a practice to write tests
after bug found)

> This is exactly the same bug as in your first comment on this issue

Yes, I reposted a bit re-worked example when answered another comment.

via perlbug:  queue: perl5 status: open

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About