develooper Front page | perl.perl5.porters | Postings from November 2016

[perl #130010] v5.25.5-184-ga5540cf breaks texinfo

Thread Next
From:
James E Keenan via RT
Date:
November 9, 2016 00:15
Subject:
[perl #130010] v5.25.5-184-ga5540cf breaks texinfo
Message ID:
rt-4.0.24-11626-1478650502-1852.130010-15-0@perl.org
On Tue, 08 Nov 2016 03:44:25 GMT, jkeenan wrote:
> On Mon, 07 Nov 2016 22:14:43 GMT, demerphq wrote:
> > On 7 November 2016 at 17:29, James E Keenan via RT
> > <perlbug-followup@perl.org> wrote:
> > > On Mon, 07 Nov 2016 13:47:55 GMT, jkeenan wrote:
> > >>
> > >> The question is:  What is it about this the pattern:
> > >>
> > >> #####
> > >> /([^\S\x{202f}\x{00a0}]+)|(\p{InFullwidth})|((?:[^\s\p{InFullwidth}]|[\x{202f}\x{00a0}])+)/
> > >> #####
> > >>
> > >> ... that (a) as of commit C<a5540cf> but not previously; and (b)
> > >> in
> > >> the context of this test suite but not in isolation, perceives
> > >> something to be a read-only value not subject to modification?
> > >
> > > My next brainstorm:  Add "use re 'debug';" to sub add_text() in
> > > tp/Texinfo/Convert/ParagraphNonXS.pm.
> > >
> > > When I did so and ran the debugging program found in one of my
> > > previous attachments, I got this output:
> > >
> > > #####
> > > Texinfo::Convert::ParagraphNonXS::add_text(../../tp/Texinfo/Convert/ParagraphNonXS.pm:329):
> > > 329:      my @segments = split
> > > 330:
> > > /([^\S\x{202f}\x{00a0}]+)|(\p{InFullwidth})|((?:[^\s\p{InFullwidth}]|[\x{202f}\x{00a0}])+)/,
> > > 331:        $text;
> > >   DB<6> n
> > > Matching REx
> > > "([^\S\x{202f}\x{00a0}]+)|(\p{InFullwidth})|((?:[^\s\p{InFull"...
> > > against "This is "
> > >    0 <> <This is >           |   0| 1:BRANCH(18)
> > >    0 <> <This is >           |   1|  2:OPEN1(4)
> > >    0 <> <This is >           |   1|  4:PLUS(16)
> > >                              | 1|  ANYOF[\t\n\x0B\f\r \x85][1680
> > >                              | 2000-200A 2028-2029 205F 3000] can
> > >                              | match 0 times out of 2147483647...
> > >                              | 1|  failed...
> > > 0 <> <This is >           |   0| 18:BRANCH(34)
> > > 0 <> <This is >           |   1|  19:OPEN2(21)
> > > 0 <> <This is >           |   1|
> > > 21:ANYOF[+utf8::Texinfo::Convert::ParagraphNonXS::InFullwidth](32)
> > >                              | 1|  failed...
> > > 0 <> <This is >           |   0| 34:BRANCH(68)
> > > 0 <> <This is >           |   1|  35:OPEN3(37)
> > > 0 <> <This is >           |   1|  37:CURLYM[0]{1,INFTY}(66)
> > > 0 <> <This is >           |   2|   39:BRANCH(51)
> > > 0 <> <This is >           |   3|    40:ANYOF[^\t\n\x0B\f\r
> > > \x85\xA0{+utf8::Texinfo::Convert::ParagraphNonXS::InFullwidth}1680
> > > 2000-200A 2028-2029 202F 205F 3000](64)
> > > Modification of a read-only value attempted at
> > > ../../tp/Texinfo/Convert/ParagraphNonXS.pm line 329.
> > > at ../../tp/Texinfo/Convert/ParagraphNonXS.pm line 329.
> > >      Texinfo::Convert::ParagraphNonXS::add_text(Texinfo::Convert::ParagraphNonXS=HASH(0x35ba938),
> > > "This is ") called at ../../tp/Texinfo/Convert/Info.pm line 308
> > >      Texinfo::Convert::Info::_info_header(Texinfo::Convert::Info=HASH(0x35b3038))
> > > called at ../../tp/Texinfo/Convert/Info.pm line 81
> > >      Texinfo::Convert::Info::output(Texinfo::Convert::Info=HASH(0x35b3038),
> > > HASH(0x35ab7c0)) called at ../texi2any.pl line 1348
> > > panic: POPSTACK
> > > Debugged program terminated.
> > > #####
> > >
> > > Since I have never previously used the regex debugger, I have no
> > > idea
> > > if there are any clues to a solution in that output.
> > >
> > > Thoughts?
> >
> > It looks like the pattern
> >
> > [^\s\p{InFullwidth}]
> >
> > causes this.
> >
> > Yves
> 
> Very likely.  When, at an earlier stage of debugging, I removed the
> two parts of the pattern that contained '\p{InFullwidth}', the panic
> disappeared.
> 
> However, that pattern was not panicking up until commit a5540cf.  And,
> even with a5540cf and commits thereafter (e.g., HEAD of blead), I can
> write a very simple perl program that has that pattern and not get a
> panic.
> 
> It's something about the interaction of commit a5540cf, that pattern,
> and the code in texinfo that is the problem.
> 
> Thanks for taking a look at this.

I am attaching a program, 'pseudo_east_asian_width.pl', which may be useful in resolving this problem.

Once we solve the problem, we have to be able to write tests that demonstrate that solution and fail in the event of regressions.  To write such a test in the core distribution we won't be able to say:

#####
use Unicode::EastAsianWidth;
#####

... but we will need to simulate enough of that module's functionality to write a realistic test.

In the attachment, 'package gamma' (the name is purely provisional) simulates much of Unicode::EastAsianWidth.  'package main' exercises gamma's 'InPseudoFullwidth' user-defined regex property in a similar (I think) to what Unicode::EastAsianWidth does.  If I go into the two parts of the texinfo library where mat and I have identified failures -- Texinfo::Convert::Line and Texinfo::Convert::ParagraphNonXS -- and replace calls to Unicode::EastAsianWidth with calls to gamma and then call the texi2any.pl program with the perl from the "bad" commit point, then I can reproduce the panic condition.

I haven't yet been able to generate the panic condition outside the texi2any.pl program, though.

Thank you very much.

-- 
James E Keenan (jkeenan@cpan.org)

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=130010

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About