[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091019155034.GA5233@lenovo>
Date: Mon, 19 Oct 2009 19:50:34 +0400
From: Cyrill Gorcunov <gorcunov@...il.com>
To: Michal Ostrowski <mostrows@...il.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Denys Fedoryschenko <denys@...p.net.lb>,
netdev <netdev@...r.kernel.org>, linux-ppp@...r.kernel.org,
paulus@...ba.org, mostrows@...thlink.net
Subject: Re: kernel panic in latest vanilla stable, while using nameif with
"alive" pppoe interfaces
[Michal Ostrowski - Mon, Oct 19, 2009 at 08:19:23AM -0500]
|
| The entire scheme for managing net namespaces seems unsafe. We depend
| on synchronization via pn->hash_lock, but have no guarantee of the
| existence of the "net" object -- hence no way to ensure the existence
| of the lock itself. This should be relatively easy to fix though as
| we should be able to get/put the net namespace as we add remove
| objects to/from the pppoe hash.
|
Hmm... it seems not. The only possible scenario I see (for such nonexistence
namespace is that when it was cached via RCU and returned before grace period
elapsed, so perhaps we need to call synchronize_net somewhere).
|
| Once you solve this existence issue, the flush_lock can be eliminated
| altogether since all of the relevant code paths already depend on a
| write_lock_bh(&pn->hash_lock), and that's the lock that should be use
| to protect the pppoe_dev field.
|
| Another patch to follow later...
|
| --
| Michal Ostrowski
| mostrows@...il.com
|
|
|
| On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
| > Michal Ostrowski a écrit :
| >> Here's my theory on this after an inital look...
| >>
| >> Looking at the oops report and disassembly of the actual module binary
| >> that caused the oops, one can deduce that:
| >>
| >> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
| >> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
| >> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
| >>
| >> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
| >> of 0xffffffff in pn->hash_table[i].
| >>
| >>>From this I'd hypothesize that the combination of dev_put() and release_sock()
| >> may have allowed us to free "pn". At the bottom of the loop we alreayd
| >> recognize that since locks are dropped we're responsible for handling
| >> invalidation of objects, and perhaps that should be extended to "pn" as well.
| >> --
| >> Michal Ostrowski
| >> mostrows@...il.com
| >>
| >>
| >
| > Looking at this stuff, I do believe flush_lock protection is not
| > properly done.
| >
| > At the end of pppoe_connect() for example we can find :
| >
| > err_put:
| > if (po->pppoe_dev) {
| > dev_put(po->pppoe_dev);
| > po->pppoe_dev = NULL;
| > }
Yep, this is unsafe, thanks!
| >
| > This is done without any protection, and can therefore clash with
| > pppoe_flush_dev() :
| >
| > spin_lock(&flush_lock);
| > po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
| > spin_unlock(&flush_lock);
| >
| > dev_put(dev); /* oops */
| >
|
Denys, could you check if the patch below help?
-- Cyrill
---
drivers/net/pppoe.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6.git/drivers/net/pppoe.c
=====================================================================
--- linux-2.6.git.orig/drivers/net/pppoe.c
+++ linux-2.6.git/drivers/net/pppoe.c
@@ -312,9 +312,9 @@ static void pppoe_flush_dev(struct net_d
}
sk = sk_pppox(po);
spin_lock(&flush_lock);
+ dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
spin_unlock(&flush_lock);
- dev_put(dev);
/* We always grab the socket lock, followed by the
* hash_lock, in that order. Since we should
@@ -708,10 +708,12 @@ end:
release_sock(sk);
return error;
err_put:
+ spin_lock(&flush_lock);
if (po->pppoe_dev) {
dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
}
+ spin_unlock(&flush_lock);
goto end;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists