[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1298104556.8559.21.camel@edumazet-laptop>
Date: Sat, 19 Feb 2011 09:35:56 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
David Miller <davem@...emloft.net>
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>, Ingo Molnar <mingo@...e.hu>,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, Pavel Emelyanov <xemul@...nvz.org>,
Daniel Lezcano <daniel.lezcano@...e.fr>
Subject: [PATCH] tcp: fix inet_twsk_deschedule()
Le vendredi 18 février 2011 à 12:38 -0800, Eric W. Biederman a écrit :
> Arnaldo Carvalho de Melo <acme@...hat.com> writes:
>
> > Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu:
> >> Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu:
> >> > This seems to be a fairly straightforward bug.
> >> >
> >> > In net/ipv4/inet_timewait_sock.c we have this:
> >> >
> >> > /* These are always called from BH context. See callers in
> >> > * tcp_input.c to verify this.
> >> > */
> >> >
> >> > /* This is for handling early-kills of TIME_WAIT sockets. */
> >> > void inet_twsk_deschedule(struct inet_timewait_sock *tw,
> >> > struct inet_timewait_death_row *twdr)
> >> > {
> >> > spin_lock(&twdr->death_lock);
> >> > ..
> >> >
> >> > and the intention is clearly that that spin_lock is BH-safe because
> >> > it's called from BH context.
> >> >
> >> > Except that clearly isn't true. It's called from a worker thread:
> >> >
> >> > > stack backtrace:
> >> > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1
> >> > > Call Trace:
> >> > > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0
> >> > > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180
> >> > > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180
> >> > > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20
> >> > > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60
> >> > > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0
> >> > > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0
> >> > > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0
> >> > > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0
> >> > > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330
> >> >
> >> > so it can deadlock with a BH happening at the same time, afaik.
> >> >
> >> > The code (and comment) is all from 2005, it looks like the BH->worker
> >> > thread has broken the code. But somebody who knows that code better
> >> > should take a deeper look at it.
> >> >
> >> > Added acme to the cc, since the code is attributed to him back in 2005
> >> > ;). Although I don't know how active he's been in networking lately
> >> > (seems to be all perf-related). Whatever, it can't hurt.
> >>
> >> Original code is ANK's, I just made it possible to use with DCCP, and
> >> yeah, the smiley is appropriate, something 6 years old and the world
> >> around it changing continually... well, thanks for the git blame ;-)
> >
> > But yeah, your analisys seems correct, with the bug being introduced by
> > one of these world around it changing continually issues, networking
> > namespaces broke the rules of the game on its cleanup_net() routine,
> > adding Pavel to the CC list since it doesn't hurt ;-)
>
> Which probably gets the bug back around to me.
>
> I guess this must be one of those ipv4 cases that where the cleanup
> simply did not exist in the rmmod sense that we had to invent.
>
> I think that was Daniel who did the time wait sockets. I do remember
> they were a real pain.
>
> Would a bh_disable be sufficient? I guess I should stop remembering and
> look at the code now.
>
Here is the patch to fix the problem
Daniel commit (d315492b1a6ba29d (netns : fix kernel panic in timewait
socket destruction) was OK (it did use local_bh_disable())
Problem comes from commit 575f4cd5a5b6394577
(net: Use rcu lookups in inet_twsk_purge.) added in 2.6.33
Thanks !
[PATCH] tcp: fix inet_twsk_deschedule()
Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()
This is caused by inet_twsk_purge(), run from process context,
and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.
Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.
With help from Linus Torvalds and Eric W. Biederman
Reported-by: Eric W. Biederman <ebiederm@...ssion.com>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
CC: Daniel Lezcano <daniel.lezcano@...e.fr>
CC: Pavel Emelyanov <xemul@...nvz.org>
CC: Arnaldo Carvalho de Melo <acme@...hat.com>
CC: stable <stable@...nel.org> (# 2.6.33+)
---
net/ipv4/inet_timewait_sock.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index c5af909..3c8dfa1 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -505,7 +505,9 @@ restart:
}
rcu_read_unlock();
+ local_bh_disable();
inet_twsk_deschedule(tw, twdr);
+ local_bh_enable();
inet_twsk_put(tw);
goto restart_rcu;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists