[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080613063037.GA16943@elte.hu>
Date: Fri, 13 Jun 2008 08:30:37 +0200
From: Ingo Molnar <mingo@...e.hu>
To: David Miller <davem@...emloft.net>
Cc: kuznet@....inr.ac.ru, vgusev@...nvz.org, mcmanus@...ksong.com,
xemul@...nvz.org, netdev@...r.kernel.org,
ilpo.jarvinen@...sinki.fi, linux-kernel@...r.kernel.org
Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets
* David Miller <davem@...emloft.net> wrote:
> From: David Miller <davem@...emloft.net>
> Date: Wed, 11 Jun 2008 16:52:55 -0700 (PDT)
>
> > More and more, the arguments are mounting to completely revert the
> > established code path changes, and frankly that is likely what I am
> > going to do by the end of today.
>
> Here is the revert patch I intend to send to Linus:
>
> tcp: Revert 'process defer accept as established' changes.
>
> This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
> ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
> the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
> ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
>
> This change causes several problems, first reported by Ingo Molnar
> as a distcc-over-loopback regression where connections were getting
> stuck.
>
> Ilpo Järvinen first spotted the locking problems. The new function
> added by this code, tcp_defer_accept_check(), only has the
> child socket locked, yet it is modifying state of the parent
> listening socket.
>
> Fixing that is non-trivial at best, because we can't simply just grab
> the parent listening socket lock at this point, because it would
> create an ABBA deadlock. The normal ordering is parent listening
> socket --> child socket, but this code path would require the
> reverse lock ordering.
>
> Next is a problem noticed by Vitaliy Gusev, he noted:
>
> ----------------------------------------
> >--- a/net/ipv4/tcp_timer.c
> >+++ b/net/ipv4/tcp_timer.c
> >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> > goto death;
> > }
> >
> >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
> >+ tcp_send_active_reset(sk, GFP_ATOMIC);
> >+ goto death;
>
> Here socket sk is not attached to listening socket's request queue. tcp_done()
> will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
> release this sk) as socket is not DEAD. Therefore socket sk will be lost for
> freeing.
> ----------------------------------------
>
> Finally, Alexey Kuznetsov argues that there might not even be any
> real value or advantage to these new semantics even if we fix all
> of the bugs:
>
> ----------------------------------------
> Hiding from accept() sockets with only out-of-order data only
> is the only thing which is impossible with old approach. Is this really
> so valuable? My opinion: no, this is nothing but a new loophole
> to consume memory without control.
> ----------------------------------------
>
> So revert this thing for now.
>
> Signed-off-by: David S. Miller <davem@...emloft.net>
the 3 reverts have been extensively tested in -tip via:
# tip/out-of-tree: 9e5b6ca: tcp: revert DEFER_ACCEPT modifications
and the distcc problems are fixed. (The locking fix alone did not fix it
conclusively in my testing, possibly due to the follow-on observations
outlined in your description.)
Tested-by: Ingo Molnar <mingo@...e.hu>
Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists