develooper Front page | perl.perl5.porters | Postings from February 2017

Re: [perl #130655] Bleadperl v5.25.8-68-g94749a5ed2 breaksMAUKE/Quote-Ref-0.03.tar.gz

Thread Next
From:
ilmari
Date:
February 3, 2017 01:24
Subject:
Re: [perl #130655] Bleadperl v5.25.8-68-g94749a5ed2 breaksMAUKE/Quote-Ref-0.03.tar.gz
Message ID:
d8j37fz9yjq.fsf@dalvik.ping.uio.no
"James E Keenan via RT" <perlbug-followup@perl.org> writes:

> On Fri, 27 Jan 2017 07:45:57 GMT, andreas.koenig.7os6VVqR@franz.ak.mind.de wrote:
>> bisect
>> ------
>> commit 94749a5ed2171bb6de72e384a78f5df552d812bb
>> Author: Karl Williamson <khw@cpan.org>
>> Date:   Tue Dec 20 13:41:58 2016 -0700
>> 
>> Deprecate non-grapheme string delimiter
>> 
>> diagnostics
>> -----------
>> Wide character in print at t/03-unicode.t line 12.
>> # Looks like your test exited with 2 before it could output anything.
>>  t/03-unicode.t ..
>> Dubious, test returned 2 (wstat 512, 0x200)
>>  Failed 6/6 subtests
>> 
>
> This test fails because the author has enabled fatalization of warnings in tests.

That's just masking the real error message, which happens to contain a
wide character.  If you disable the fatal warnings, you get:

> #####
    1 use warnings;
>   2 use strict;
>   3 use utf8;
>   4 
>   5 use Test::More tests => 6;
>   6 
>   7 use Quote::Ref;
>   8 
>   9 is_deeply qwa foo bar baz , [qw foo bar baz ];
>  10 is_deeply qwh foo bar baz " , {qw foo bar baz " };
>  11 
>  12 is_deeply qwa foo     bar , [qw foo     bar ];
>  13 is_deeply qwh foo     bar , {qw foo     bar };
> ...
> #####

ilmari@garkbit:~/.cpanm/work/1485526294.17204/Quote-Ref-0.03$ prove -bv t/03-unicode.t
t/03-unicode.t ..
1..6
Wide character in print at t/03-unicode.t line 12.
Unrecognized character \x{2665}; marked by <-- HERE after [qw foo   <-- HERE near column 39 at t/03-unicode.t line 12.
# Looks like your test exited with 2 before it could output anything.
Dubious, test returned 2 (wstat 512, 0x200)
Failed 6/6 subtests

This reduces to the following:

$ perl5.25.9 -CS -Mutf8 -wE 'say qw    bar '
Unrecognized character \x{2665}; marked by <-- HERE after  qw foo   <-- HERE near column 14 at -e line 1.

Which is a regression:

$ perl5.25.8 -CS -Mutf8 -wE 'say qw foo     bar '
foo  bar

In fact it seems like if the opening delimiter is above U+100, any
closing delimiter in the same U+x000 range matches, until we get to
U+10000, above which even cross-range delimiters match.

    #!/usr/bin/env perl

    use utf8;
    use strict;
    use warnings;
    use open qw(:std :utf8);
    use experimental qw(regex_sets);
    use feature qw(unicode_eval);

    my @delims = map {
        my $s = $_ * 0x1000;
        my $e = $s + 0xfff;
        # Get the first two accepatble delimiters in this block
        my ($o, $c) = grep /(?[ \p{Assigned} & !(
            \p{Letter} | \p{Number} | \p{Space} |
            \p{Nonspacing_Mark} | \p{Spacing_Mark} | \p{Format} |
            \p{Private_Use}
        ) ])/x,
            map chr, $s..$e;
        defined $o && defined $c
            ? ($o, $c)
            : ();
    } 0..0xff;

    splice @delims, 2, 0, "\N{U+2C2}", "\N{U+2F5}"; # between U+100 and U+1000

    print "perl $]\n";
    for my $i (0..$#delims-1) {
        my ($o, $c) = @delims[$i, $i+1];

        my $ok = eval "my \$x = q${o}foo${c}" ? "not ok" : "ok    ";
        warn "$@" if $ok eq "ok" and $@ !~ /string terminator/;
        printf "$ok - U+%04X U+%04X\n", ord $o, ord $c;
    }

On perl 5.25.9, we get the following failures:

perl 5.025009
ok     - U+0000 U+0001
ok     - U+0001 U+02C2
not ok - U+02C2 U+02F5
ok     - U+02F5 U+104A
not ok - U+104A U+104B
ok     - U+104B U+2010
not ok - U+2010 U+2011
ok     - U+2011 U+3001
not ok - U+3001 U+3002
ok     - U+3002 U+4DC0
not ok - U+4DC0 U+4DC1
ok     - U+4DC1 U+A490
not ok - U+A490 U+A491
ok     - U+A491 U+D800
not ok - U+D800 U+D801
ok     - U+D801 U+FB29
not ok - U+FB29 U+FBB2
ok     - U+FBB2 U+10100
not ok - U+10100 U+10101
not ok - U+10101 U+11047
not ok - U+11047 U+11048
not ok - U+11048 U+12470
not ok - U+12470 U+12471
not ok - U+12471 U+16A6E
not ok - U+16A6E U+16A6F
not ok - U+16A6F U+1BC9C
not ok - U+1BC9C U+1BC9F
not ok - U+1BC9F U+1D000
not ok - U+1D000 U+1D001
not ok - U+1D001 U+1E95E
not ok - U+1E95E U+1E95F
not ok - U+1E95F U+1F000
not ok - U+1F000 U+1F001


While on perl 5.25.8 all is good:

$ ~/tmp/delimwtf.pl
ok     - U+0000 U+0001
ok     - U+0001 U+02C2
ok     - U+02C2 U+02F5
ok     - U+02F5 U+104A
ok     - U+104A U+104B
ok     - U+104B U+2010
ok     - U+2010 U+2011
ok     - U+2011 U+3001
ok     - U+3001 U+3002
ok     - U+3002 U+4DC0
ok     - U+4DC0 U+4DC1
ok     - U+4DC1 U+A490
ok     - U+A490 U+A491
ok     - U+A491 U+D800
ok     - U+D800 U+D801
ok     - U+D801 U+FB29
ok     - U+FB29 U+FBB2
ok     - U+FBB2 U+10100
ok     - U+10100 U+10101
ok     - U+10101 U+11047
ok     - U+11047 U+11048
ok     - U+11048 U+12470
ok     - U+12470 U+12471
ok     - U+12471 U+16A6E
ok     - U+16A6E U+16A6F
ok     - U+16A6F U+1BC9C
ok     - U+1BC9C U+1BC9F
ok     - U+1BC9F U+1D000
ok     - U+1D000 U+1D001
ok     - U+1D001 U+1E95E
ok     - U+1E95E U+1E95F
ok     - U+1E95F U+1F000
ok     - U+1F000 U+1F001







-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.               - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.                      - Calle Dybedahl

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About