Front page | perl.qpsmtpd |
Postings from August 2007
Re: connections hanging around forever
Thread Previous
|
Thread Next
From:
Chris Garrigues
Date:
August 31, 2007 11:25
Subject:
Re: connections hanging around forever
Message ID:
1188584694.29129.TMDA@io.trinsics.com
> From: Chris Garrigues <cwg-bcc@Trinsics.Com>
> Date: Fri, 31 Aug 2007 13:09:52 -0500
>
> > From: Charlie Brady <charlieb-qpsmtpd@budge.apana.org.au>
> > Date: Fri, 31 Aug 2007 13:49:02 -0400 (EDT)
> >
> >
> > On Fri, 31 Aug 2007, Chris Garrigues wrote:
> >
> > >> From: Chris Garrigues <cwg-qpsmtpd@Trinsics.Com>
> > >> Date: Wed, 29 Aug 2007 09:27:42 -0500
> > ...
> > >> Any idea what's going on here? It requires a -9 to kill the processes.
> > ...
> > > and then it hangs forever and requires a -9.
> >
> > Are you quite sure of that? What happens if you use a TERM or QUIT signal?
> > Have you attached strace (or whatever syscall tracing tool is appropriate
> > for your platform)?
>
> I just confirmed it. If I strace I just see that it's hanging on "read(0, ".
>
> Since I hadn't diagnosed what was triggering it until just now, I didn't know
> how to provide a test case. Now that I do, I'll try to get a real strace.
>
> > > Am I doing the wrong thing, is this a bug, or is there something odd about my
> > > system?
> >
> > Last time I looked the qpsmtpd timeout alarm only applied while parsing
> > SMTP or while receiving messages, but not while plugins were executing. I
> > haven't seen any discussion about possible fixes for that (but I haven't
> > checked that it hasn't been fixed). That could explain qpsmtpd waiting
> > forever, but wouldn't explain faulure to terminate on TERM and QUIT
> > signals.
>
> Note that it's no longer in my plugin at this point.
Okay, I did an strace while telnetting to the smtp port. If I type "quit"
after I get the 550, it does the right thing, but if I just let it sit there,
it never times out. According to the strace:
write(2, "21349 Plugin tarpit, hook deny r"..., 51) = 51
write(2, "21349 550 No such user as utterl"..., 60) = 60
write(1, "550 No such user as utterlybogus"..., 55) = 55
alarm(120) = 0
read(0, 0x8c43798, 4096) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
read(0,
The read following the alarm does indeed wait for 120 seconds, but then the
alarm is blocked....
...and I've just concluded that the problem must be in a library of my own
called by my code. It's code that I wrote years ago...I'll have to figure out
why I blocked the signals that I did when I did.
Thanks for helping me find the problem in my own code.
Chris
--
Chris Garrigues Trinsic Solutions
President 710-B West 14th Street
Austin, TX 78701-1798
http://www.trinsics.com/blog
http://www.trinsics.com 512-322-0180
Would you rather proactively pay for
uptime or reactively pay for downtime?
Trinsic Solutions
Your Trusted Friends in Proactive IT.
Thread Previous
|
Thread Next