Front page | perl.perl5.porters |
Postings from October 2014
Re: [MSWin32] I need help with RT#98976
From: Craig A. Berry
October 3, 2014 00:02
Re: [MSWin32] I need help with RT#98976
Message ID: CA+vYcVyx38j1GSTr_Ki9p4jLiqqtHJndimrb+iND=g993Wyrvw@mail.gmail.com
On Thu, Oct 2, 2014 at 9:12 AM, Paul "LeoNerd" Evans
> On Thu, 2 Oct 2014 08:38:20 -0500
> "Craig A. Berry" <email@example.com> wrote:
>> It looks like a race condition in IO::Socket::IP::connect, which calls
>> CORE::connect, gets EINPROGRESS, then sets up a select to implement
>> the timeout that waits for the connection to complete. The select
>> always times out because the connection completes in between calling
>> CORE::connect and select so there is never any activity on the socket
>> that would make the select fire. The following seems to fix it:
> I'm not sure I believe that explanation.
I'm not 100% sure I do either but if I'm wrong, then we have to
explain why the $self->connected() call that I added (which is based
on getpeername) returns true if indeed the connection has not been
successfully made as you suppose.
> I'm select()ing for writability or exceptional status. If as you
> suggest, the connect() has already fully succeeded before we enter the
> select(), then the socket is now nicely connected and usable, at which
> point it definitely is writable, so the select() should yield. If
> instead the connect() has now failed, the socket is now in exceptional
> status, so again the select() will yield.
It's surprisingly difficult to find a clear, detailed description of
what "ready to write" actually means, especially for a non-blocking
socket, which is always ready in the sense that it won't block even if
it's busy. But according to the standard at
'If a non-blocking call to the connect() function has been made for a
socket, and the connection attempt has either succeeded or failed
leaving a pending error, the socket shall be marked as writable."
which sounds exactly like what you're doing and really doesn't seem
all that much different from what's in IO::Socket::connect(). I do
wish they would not use the word "writable," which sounds like a
permission attribute rather than a state or readiness attribute, but I
So I agree that what you're describing is how things should work, but
for some reason they don't. Thus reasoning from how it's supposed to
work may not lead to a solution.
> Again it should be noted that the bug isn't a timeout bug - the
> function quickly returns EBADF - this means that some system call
> received a file descriptor number that doesn't even relate to an open
> file descriptor at all. I can't see how that situation could arise from
> this explanation.
All of my debugging has been on VMS, which never sets EBADF and always
consistently sets ETIMEDOUT. Increasing the timeout, as I said
previously, just means it waits longer before failing. Stepping
through IO::Socket::IP::connect in the Perl debugger clearly shows
that the initial connect sets EINPROGRESS and also that the select
never fires until the timeout expires, even if the timeout is
increased to 10 seconds or more. Supposing that the connect() neither
succeeds nor fails within 10 seconds on a quiet system seems pretty
dubious to me, so the only conclusion I can draw is that the select()
isn't doing what it's intended to.
I've now built blead on Windows 7 and commented out the skip; the
22timeout.t test never failed in a couple dozen runs on a pretty fast
i7 laptop. I believe the failing smokes were on older versions of
Windows, and/or possibly on slower VMs, which may or may not be
relevant. So I don't have anything to debug on Windows. There aren't
that many places in the relevant code, though, where a file descriptor
is passed to a syscall, so that select is still a prime suspect.
Do note that the blocking() implementation, which as far as I can tell
is inherited from IO::Socket, is different on Windows and VMS from
everything else. Again, may or may not be relevant, but it's
different from systems where things are working.