develooper Front page | perl.perl5.porters | Postings from March 2015

[perl #123861] check_locale_boundary_crossing assertion failure

Thread Previous | Thread Next
From:
Hugo van der Sanden via RT
Date:
March 1, 2015 10:22
Subject:
[perl #123861] check_locale_boundary_crossing assertion failure
Message ID:
rt-4.0.18-25196-1425205369-361.123861-15-0@perl.org
On Sat Feb 28 20:43:06 2015, public@khwilliamson.com wrote:
> On 02/22/2015 06:59 PM, Hugo van der Sanden via RT wrote:
> > On Sun Feb 22 11:20:18 2015, sprout wrote:
> >> On Tue Feb 17 18:22:58 2015, hv wrote:
> >>> Here's a variant that triggers a different assert:
> >>>
> >>> % ./perl -Ilib -e '"0\7000"=~m{\C+?0}'
> >>> \C is deprecated in regex; marked by <-- HERE in m/\C <-- HERE +?0/
> >>> at
> >>> -e line 1.
> >>> perl: regexec.c:6606: S_regmatch: Assertion `n == (32767) ||
> >>> locinput
> >>> == li' failed.
> >>> Aborted (core dumped)
> >>>   %
> >>
> >> I have a debugging 5.14.4 installed, and it doesn’t fail the
> >> assertion.  When was this bug introduced?
> >
> > The first case (/\C0/il) bisects to quite recently:
> >
> > commit 1d39b2cd2a278ed0630f07bd7598726910eb6427
> > Author: Karl Williamson <khw@cpan.org>
> > Date:   Fri Dec 26 18:31:04 2014 -0700
> >
> > Simplify foldEQ_utf8
> >
> > This moves the uncommon case of handling inputs under non-UTF-8
> > locales
> > out of this function to the functions it calls, which already have
> > the
> > logic to handle it.  This simplifies this function, cutting a couple
> > branches each time through the loop from the common usage.
> >
> > The locale handling is slowed down somewhat, but even if that were a
> > concern, another simpler function is normally used for locale
> > handling.
> > This gets called only when one or both of the comparison strings is
> > UTF-8, which should be comparatively rare for non-UTF8 locales.
> 
> This bisect doesn't really mean anything.  Things were added that
> didn't
> need this assert before.
> 
> I'm having trouble reproducing it.  What locale is in effect?

C locale:

% LC_ALL=C ./perl -Ilib -e '"\700" =~ /\C0/il'
\C is deprecated in regex; marked by <-- HERE in m/\C <-- HERE 0/ at -e line 1.
perl: utf8.c:1890: S_check_locale_boundary_crossing: Assertion `((U8)(*p) >= 0xc4)' failed.
Aborted (core dumped)
% 

In utf8.c:Perl__to_utf8_fold_flags, it's assumed we're at the start of a well-formed character; in this case, however, we're calling it with p pointing at the second octet of [\xc7 \x80], so we fail the tests UTF8_IS_INVARIANT(*p) and UTF8_IS_DOWNGRADEABLE_START(*p) and fall through to:
    else {  /* utf8, ord above 255 */

Here's the full stack trace:

perl: utf8.c:1890: S_check_locale_boundary_crossing: Assertion `((U8)(*p) >= 0xc4)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff70e9bb9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  0x00007ffff70e9bb9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff70ecfc8 in __GI_abort () at abort.c:89
#2  0x00007ffff70e2a76 in __assert_fail_base (
    fmt=0x7ffff7234370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x80c65e "((U8)(*p) >= 0xc4)", 
    file=file@entry=0x80bf15 "utf8.c", line=line@entry=1890, 
    function=function@entry=0x80dd20 <__PRETTY_FUNCTION__.14465> "S_check_locale_boundary_crossing") at assert.c:92
#3  0x00007ffff70e2b22 in __GI___assert_fail (
    assertion=0x80c65e "((U8)(*p) >= 0xc4)", file=0x80bf15 "utf8.c", 
    line=1890, 
    function=0x80dd20 <__PRETTY_FUNCTION__.14465> "S_check_locale_boundary_crossing") at assert.c:101
#4  0x00000000006d7499 in S_check_locale_boundary_crossing (p=0xa64751 "\200", 
    result=128, ustrp=0x7fffffffd510 "\200\325\377\377\377\177", 
    lenp=0x7fffffffd4b8) at utf8.c:1890
#5  0x00000000006d828f in Perl__to_utf8_fold_flags (p=0xa64751 "\200", 
    ustrp=0x7fffffffd510 "\200\325\377\377\377\177", lenp=0x7fffffffd4b8, 
    flags=3 '\003') at utf8.c:2229
#6  0x00000000006e159d in Perl_foldEQ_utf8_flags (s1=0xa646c8 "0", pe1=0x0, 
    l1=1, u1=false, s2=0xa64751 "\200", pe2=0x7fffffffd668, l2=0, u2=true, 
    flags=2) at utf8.c:4084
#7  0x00000000006c38a3 in S_regmatch (reginfo=0x7fffffffe1a0, 
    startpos=0xa64750 "\307\200", prog=0xa646c0) at regexec.c:5473
#8  0x00000000006bc227 in S_regtry (reginfo=0x7fffffffe1a0, 
    startposp=0x7fffffffdd48) at regexec.c:3492
#9  0x00000000006b1a3c in S_find_byclass (prog=0xa649d0, c=0xa646c0, 
    s=0xa64750 "\307\200", strend=0xa64752 "", reginfo=0x7fffffffe1a0)
    at regexec.c:1809
#10 0x00000000006bb5a8 in Perl_regexec_flags (rx=0xa614f0, 
    stringarg=0xa64750 "\307\200", strend=0xa64752 "", 
    strbeg=0xa64750 "\307\200", minend=0, sv=0xa61430, data=0x0, flags=97)
    at regexec.c:3244
#11 0x00000000005988f1 in Perl_pp_match () at pp_hot.c:1486
#12 0x0000000000545c42 in Perl_runops_debug () at dump.c:2237
#13 0x0000000000460555 in S_run_body (oldscope=1) at perl.c:2427
#14 0x000000000045fb99 in perl_run (my_perl=0xa42010) at perl.c:2350
#15 0x000000000041eee5 in main (argc=4, argv=0x7fffffffe638, 
    env=0x7fffffffe660) at perlmain.c:116
(gdb)

Hugo

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=123861

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About