On Sat, Dec 25, 2004 at 09:50:46PM -0000, sroy @ search-box. com wrote: > The breakpoint stops the first time perl needs to check whether a > utf8 character is part of a string class. At this point (step #5) everything > is ok. By step #6 the value of PL_bostr (my_perl->Tbostr) is corrupted. > To see more details, instead of c at step #6 do: > > 6. fin > 7. s 4 > > Now the debugger is sitting at the line that corrupts prog->startp. > Ultimately, this corruption leads to a seg fault at pp_hot.c:2151 when perl > tries to copy characters as part of the s/// operation. I can recreate this on OS X when running with the perl debugger. I can't recreate it on FreeBSD (on a box where valgrind has been installed) and annoyingly the x86 Linux box I usually use for this sort of thing is currently inaccessible. > ANALYSIS: > > In the middle of processing the regular expression, The regex library > demand-loads a bunch of stuff to create the swashes for the [:print:] > expression. At the end of all that PL_bostr has a completely new value. > I have no idea whether the right fix is to move away from using PL_bostr > in the regex library in favor of some local variable, or to try and > save PL_bostr and restore it before any line that might change it. Thanks for the analysis, which seems to be spot on. (Seems, because I'm no expert on the regexp engine's guts). Ideally we'd really like to re-write the regexp engine sufficiently to remove all the global state, and hence make it totally re-entrant. Currently no-one with the expertise to do this has the time. Currently there are kludges to save enough state to theoretically make the utf8 initialisation work: /* XXX Here's a total kludge. But we need to re-enter for swash routines. */ void Perl_save_re_context(pTHX) { SAVEI32(PL_reg_flags); /* from regexec.c */ SAVEPPTR(PL_bostr); SAVEPPTR(PL_reginput); /* String-input pointer. */ but what doesn't make sense to me is why PL_bostr isn't being saved (or maybe isn't being restored) via the code path that you code takes. The realistic fix is going to be to make it save and restore correctly for the class of operations that your code represents. I don't have the experience with the regexp engine to know where to look to quickly find the correct solution, but I believe that several other people on the perl5-porters mailing list do. Nicholas ClarkThread Previous | Thread Next