[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110402203727.GI5552@1wt.eu>
Date: Sat, 2 Apr 2011 22:37:27 +0200
From: Willy Tarreau <w@....eu>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Cyril Bonté <cyril.bonte@...e.fr>,
netdev@...r.kernel.org, Daniel Baluta <daniel.baluta@...il.com>,
Gaspar Chilingarov <gasparch@...il.com>,
Charles Duffy <charles@...is.net>
Subject: Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
On Sat, Apr 02, 2011 at 09:44:55PM +0200, Eric Dumazet wrote:
> Le samedi 02 avril 2011 à 21:15 +0200, Willy Tarreau a écrit :
> > Hi Eric,
> >
> > On Sat, Apr 02, 2011 at 08:46:11PM +0200, Cyril Bonté wrote:
> > > Le samedi 2 avril 2011 20:10:48, Eric Dumazet a écrit :
> > > > Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> > > > (...)
> > > > > > if (shutdown(listenfd, SHUT_WR) == 0 &&
> > > > >
> > > > > listen(listenfd, 1024) == 0 &&
> > > > > shutdown(listenfd, SHUT_RD) == 0) {
> > > > >
> > > > > printf("shutdown OK\n");
> > > > >
> > > > > }
> > > > >
> > > > > }
> > > > > exit(0);
> > > > >
> > > > > }
> > > >
> > > > Wow, not clear what this is doing....
> > > >
> > > > for sure the listen() call is not needed ?
> > > >
> > > > And the shutdown(listenfd, SHUT_WR) is clearly useless too.
> > >
> > > Well, I'm not the best one to explain that part but from what i read in the
> > > comments of this part of code, both listen and SHUT_WR are used to detect
> > > errors on various OS (OpenBSD, Solaris, ...).
> > >
> > > > I feel you only needed the shutdown(listenfd, SHUT_RD) call.
> > > >
> > > > Why haproxy needs to setup a second listening socket on same port ?
> > >
> > > I simplified the test case, which is far from what haproxy do (just forgot to
> > > explain the real behaviour).
> > > To reload the configuration, a new haproxy process is launched, sending a
> > > signal to the previous one and asking it to free the ports for a while (the
> > > shutdown part in the test). The new process then tries to bind the ports,
> > > which worked until 2.6.38 (if an error occurs, a new signal is sent to the
> > > previous process to listen to its sockets again).
> >
> > Indeed, here's what normally happens when haproxy reloads.
> >
> > New process is loaded with a new config. Once the config correctly parses,
> > it sends a signal to the previous process asking it to temporarily release
> > its listening ports so that the new one can bind, hence the shutdown(SHUT_RD)
> > performed in the old process.
> >
> > Then the new process can grab the ports and listen to them. Once that's OK,
> > it sends another signal to the old process telling it it can go away. But
> > if the new process failed to completely start (eg: could not grab one port),
> > then it sends a third signal to the old process asking it to rebind the port
> > and serve them again, and the new one dies with an error.
> >
> > That way, the service is never interrupted even if the new config fails
> > late, because the old process has the ability to rebind to the port it
> > temporarily released.
> >
> > Now with 2.6.38, as Cyril diagnosed it, the new bind() fails when the
> > old process has just performed its shutdown(SHUT_RD), preventing the
> > new process from binding to the ports until the old process has
> > definitely closed them.
> >
> > The behaviour is very useful, because the old process might have lost
> > its privileges, it will not have to rebind to the socket, just listen
> > on it again since it is never closed.
> >
> > This is quite embarrassing, because this code used to work for the
> > last 10 years, at least since kernel 2.2, and maybe even 2.0, I don't
> > remember.
> >
> > I'm not sure what the original intent of the patch was, not what was
> > the reported issue, but maybe we could find a way to both fix the
> > reported issue (if any) and restore the old behaviour in order not
> > to break existing programs.
> >
> > Best regards,
> > Willy
> >
>
> I wish it was that simple....
>
> http://www.spinics.net/lists/netdev/msg151551.html
What a mess :-(
I've been used to actively bind() to source ip:ports when dealing with that
number of connections, because I've long noticed that the port auto-selection
did not work once all source ports were used on at least one IP address.
Managing a source port list in user space is no big deal when you have to
support hundreds of thousands of connections, as there are harder issues
to deal with :-/
> Is Cyril program running OK on FreeBsd ?
I don't think so, as from memories, both FreeBSD and OpenBSD fail
on isten() after a shutdown(SHUT_RD), hence the strange looking
shut+listen+shut sequence you noticed (in order to detect whether
listen will work again or not).
I'm just wondering the relation between the SHUT_RD listen sockets
that we catch by accident and the issue that was the initial goal
of the patch regarding outgoing sockets. All this is not very clear
to me yet.
Regards,
Willy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists