[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <200902041418.09630.rusty@rustcorp.com.au>
Date: Wed, 4 Feb 2009 14:18:08 +1030
From: Rusty Russell <rusty@...tcorp.com.au>
To: Karsten Keil <kkeil@...e.de>
Cc: linux-kernel@...r.kernel.org, Michal Hocko <mhocko@...e.cz>,
richard kennedy <richard@....demon.co.uk>,
Dan Williams <dan.j.williams@...el.com>,
Dmitry Torokhov <dmitry.torokhov@...il.com>,
Russell King <rmk+kernel@....linux.org.uk>,
dwmw2@...radead.org, Scott Wood <scottwood@...escale.com>,
netdev@...r.kernel.org, Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [RFC] Suspicious bug in module refcounting
On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.
Hi Karsten,
Yes, the BUG_ON() is overly aggressive. And I really hate __module_get,
and it looks like most of the callers are completely bogus. The watchdog
drivers use it to nail themselves in place in their open routines: this is
OK, if a bit weird.
We should only use __module_get() when you *can't handle* failure;
otherwise you should accept that the admin did rmmod --wait and don't use the
module any further.
dmaengine.c seems to be taking liberties like this. AFAICT it can error
out, so why not just try_module_get() always?
gameport.c, serio.c and input.c increment their own refcount, but to get
into those init functions someone must be holding a refcount already (ie. a
module depends on this module). Ditto cyber2000fb.c, and MTD.
mdio-bitbang.c should definitely use try_module_get.
loop.c bumping its own refcount, Al might know why, but definitely can be
try_module_get() if it's valid at all.
net/socket.c can also handle failure, so that's another try_module_get.
etc.
> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.
Agreed. We will need a "nail_module()" call for those legitimate uses (which
should clear mod->exit, rather than manipulating the refcount at all).
Meanwhile, I'll remove the BUG_ON for 2.6.29.
Thanks,
Rusty.
module: remove over-zealous check in __module_get()
module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@...e.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
can give false results, for example).
Almost noone should be using __module_get, but that's another issue.
Signed-off-by: Rusty Russell <rusty@...tcorp.com.au>
diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -407,7 +407,6 @@ static inline void __module_get(struct m
static inline void __module_get(struct module *module)
{
if (module) {
- BUG_ON(module_refcount(module) == 0);
local_inc(__module_ref_addr(module, get_cpu()));
put_cpu();
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists