[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ya69y7hPo52V0kRy@T590>
Date:   Tue, 7 Dec 2021 09:50:03 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Petr Mladek <pmladek@...e.com>, linux-kernel@...r.kernel.org,
        Luis Chamberlain <mcgrof@...nel.org>
Subject: Re: [PATCH V2 2/2] kobject: wait until kobject is cleaned up before
 freeing module
On Mon, Dec 06, 2021 at 09:04:40AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Dec 06, 2021 at 10:13:53AM +0800, Ming Lei wrote:
> > On Fri, Dec 03, 2021 at 04:07:39PM +0100, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 29, 2021 at 11:45:09AM +0800, Ming Lei wrote:
> > > > kobject_put() may become asynchronously because of
> > > > CONFIG_DEBUG_KOBJECT_RELEASE, so once kobject_put() returns, the caller may
> > > > expect the kobject is released after the last refcnt is dropped, however
> > > > CONFIG_DEBUG_KOBJECT_RELEASE just schedules one delayed work function
> > > > for cleaning up the kobject.
> > > 
> > > The caller should NOT expect the kobject to be released.  That's the
> > > whole point of dynamic reference counted objects, you never "know" when
> > > the last object is released.  This option just makes it obvious so that
> > > you know when to fix up code that has this assumption.
> > 
> > Yes, so CONFIG_DEBUG_KOBJECT_RELEASE needs to be fixed.
> 
> What is broken with it today?  It is there for you to find problems in
> your kernel code that uses kobjects.  What oops/crash/whatever is it
> causing that you feel it should not be causing?
> 
> A module's kobject is "owned" by the module core, not the module code
No, this patch is nothing to do with module's kobject, we are talking
about any kobjects allocated/released from one driver built as module.
> that is being unloaded, so I don't see the problem here.  More details
> please.
If CONFIG_DEBUG_KOBJECT_RELEASE is enabled, kobject_release() will
schedule a (random time)delay work to run kobject_cleanup(), and the delay
work may be run after the module which allocates/frees the kobject is
unloaded.
kobject_cleanup():
	struct kobj_type *t = get_ktype(kobj);
	...
	if (t && t->release) {
                pr_debug("kobject: '%s' (%p): calling ktype release\n",
                         kobject_name(kobj), kobj);
                t->release(kobj);
	}
Both kobj_type and ->release are allocated in the module data/text section,
so kernel panic is triggered when 't && t->release' is run from the
delay work context.
> 
> > > > Inside the cleanup handler, kobj->ktype and kobj->ktype->release are
> > > > required.
> > > 
> > > Yes. Is that a problem?
> > 
> > Of course for CONFIG_DEBUG_KOBJECT_RELEASE, which delays to call
> > ->release after random time, when the module for storing ->ktype and
> > ->ktype->release has been unloaded.
> > 
> > As I mentioned, the issue can be triggered 100% by 'modprobe -r
> > kset-example' when CONFIG_DEBUG_KOBJECT_RELEASE is enabled if the
> > 1st patch is applied.
> 
> Is there any "real" kernel code that this causes problems on?
> 
> Again, this is for debugging, yes, this tiny example will crash that
> way, but that is fine, as we can obviously see that the kernel code here
> is correct.
Nothing is wrong with kset-example, the issue is just that foo_ktype and
foo_release are allocated in code/data section of the module 'kset-example'.
There are ~150 such uses:
[linux]$ git grep -n "static struct kobj_type" ./  | grep "{" | wc
    153     923   11676
Most of the code can be built as module, so all should have such problem,
that is why I think it as one generic issue, not kset-example specific.
Here kset-example is referred just for showing the issue easily.
> 
> And if you really want to ensure that it works properly, let's wait on
> release before allowing that module to be unloaded.  But again, module
Then all modules which uses kobject need such change.
> unload is NOT a normal operation and is not what this debugging option
> was created to help out with.
But CONFIG_MODULE and CONFIG_DEBUG_KOBJECT_RELEASE can be enabled at the
same time.
> 
> Again, the confusion between kobjects (which protect data) and module
> references (which protect code) is getting mixed up here.
> 
> > > > It is supposed that no activity is on kobject itself any more since
> > > > module_exit() is started, so it is reasonable for the kobject user or
> > > > driver to expect that kobject can be really released in the last run of
> > > > kobject_put() in module_exit() code path. Otherwise, it can be thought as
> > > > one driver's bug since the module is going away.
> > > 
> > > Why is module_exit() somehow special here?  What is so odd about that?
> > 
> > After module_exit() is done, the module will be unloaded, then any code
> > or data stored in the module can't be referred.
> > 
> > > 
> > > > When the ->ktype and ->ktype->release are allocated as module static
> > > > variable, it can cause trouble because the delayed cleanup handler may
> > > > be run after the module is unloaded.
> > > 
> > > Why is ktype and release part of module code?
> > 
> > Lots of driver defines ktype and ktype->release in its module static
> > variable.
> 
> They do?  Where?
> 
> > > What module kobject is causing this problem?
> > 
> > Any modules which defines its ktype and ktype->release in its module
> > static variable, which is pretty common.
> 
> What non-example code does this?  Let's fix that.
> 
> > > > Fixes the issue by flushing scheduled kobject cleanup work before
> > > > freeing module.
> > > 
> > > Why are modules special here?
> > > 
> > > And if you enable this option, and then start unloading kernel modules,
> > > yes, things can go wrong, but that's not what this kernel option is for
> > > at all.
> > > 
> > > This feels like a hack for not a real problem.
> > 
> > I think it is caused by CONFIG_DEBUG_KOBJECT_RELEASE, that is why this
> > patch is posted. Otherwise I'd suggest to remove
> > CONFIG_DEBUG_KOBJECT_RELEASE, which supposes to not panic kernel since
> > there isn't anything wrong from driver side.
> 
> Perhaps just put a nice warning in that debug option that says "beware
> of unloading modules with this option enabled."
> 
> Or better yet, forbid it if that option is enabled :)
You mean disabling CONFIG_DEBUG_KOBJECT_RELEASE if CONFIG_MODULE is
enabled?
Thanks,
Ming
Powered by blists - more mailing lists
 
