[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160506004222.GA31986@packer-debian-8-amd64.digitalocean.com>
Date: Thu, 5 May 2016 20:42:22 -0400
From: Jessica Yu <jeyu@...hat.com>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Miroslav Benes <mbenes@...e.cz>, Jiri Kosina <jikos@...nel.org>,
pmladek@...e.com, jslaby@...e.cz, live-patching@...r.kernel.org,
linux-kernel@...r.kernel.org, huawei.libin@...wei.com,
minfei.huang@...oo.com
Subject: Re: livepatch: allow removal of a disabled patch
+++ Josh Poimboeuf [05/05/16 10:04 -0500]:
>On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:
>> On Thu, 5 May 2016, Josh Poimboeuf wrote:
>>
>> > On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
>> > > I think it boils down to the following problem.
>> > >
>> > > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
>> > >
>> > > 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct
>> > > kobject
>> > >
>> > > 3. it is allocated during klp_init_patch() and all is fine
>> > >
>> > > 4. now we want to remove the patch module. It is disabled and module_put()
>> > > is called. User calls rmmod on the module.
>> > >
>> > > 5. klp_unregister_patch() is called in __exit method.
>> > >
>> > > 6. klp_free_patch() is called.
>> > >
>> > > 7. kobject_put(patch->kobj) is called.
>> > >
>> > > ...now it gets interesting...
>> > >
>> > > 8. among others kobject_cleanup() is scheduled as a delayed work (this is
>> > > important).
>> > >
>> > > 9. there is no completion, so kobject_put returns and the module goes
>> > > away.
>> > >
>> > > 10. someone calls patch enabled_store attribute (for example). They can
>> > > because kobject_cleanup() has not been called yet. It is delayed
>> > > scheduled.
>> > >
>> > > ...crash...
>> >
>> > But what exactly causes the crash? In enabled_store() we can see that
>> > the patch isn't in the list, so we can return -EINVAL.
>>
>> Ok, bad example. Take enabled_show() instead. It could be fixed in the
>> same way, but I am not sure it is the right thing to do. It does not scale
>> because the problem is elsewhere.
>>
>> Anyway, it is (even if theoretically) there in my opinion and we
>> have two options.
>>
>> 1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok
>> without completion and regardless of dynamic/static kobject allocation.
>>
>> 2. We introduce completion and we are ok even with
>> CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static
>> kobject allocation.
>
>I would disagree with the statement that the dynamic kobject doesn't
>scale. We would just need a helper function to get from a kobject to
>its klp_patch.
>
>In fact, to me it seems like the right way to do it. It doesn't make
>sense for the code which creates the kobject to be different from the
>code which initializes it. It's slightly out of context, but
>kobject.txt does say:
>
> "Code which creates a kobject must, of course, initialize that object."
>
>I view the completion as a hack to compensate for the fact that we're
>abusing the kobject interface. And so it makes sense to me that
>CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
>kobjects in the wrong way.
>
>So in my view, the two options are:
>
>1. Convert the kobject to dynamic as I described.
>
>2. Change the klp_register() interface so that klp_patch gets allocated
> in livepatch code.
>
>I'd be curious to hear what others think.
So, I think both of these solutions would enable us to get rid of
the completion. Let me try to summarize..
For solution #1, if we dynamically allocate the kobject, i.e. we have a
pointer now instead of having it embedded in the klp_patch struct, we no
longer need to worry if the corresponding klp_patch gets deallocated under
our nose. Since the kobject_cleanup function is delayed w/
CONFIG_DEBUG_KOBJECT_RELEASE, it is possible to have sysfs entries that
refer to a klp_patch that no longer exists. Thus if any of the sysfs
functions get called, we would have to take care to ensure that the
klp_patch struct corresponding to the kobject in question actually still
exists. In this case, all sysfs functions would require an extra check to
make sure the matching klp_patch is still on the patches list and return an
error if it isn't found. The "pro" is that this change would be simple, the
"con" is that now kobjects are decoupled and managed completely separately
from the object (klp_patch) with which they are associated, which doesn't
feel 100% right.
For solution #2, we could have livepatch manage the (de)allocation of
klp_patch objects internally. Maybe in this scenario the caller would need
to request a klp_patch object be allocated and the caller would fill out the
returned klp_patch struct as appropriate. In this case, we would be able
leave the kobject embedded in the klp_patch struct (and dynamic kobjects
wouldn't be needed), as livepatch would now have control of both structures.
Then during the patch module exit path, when kobject_put is called, the
klp_patch struct would be freed in its kobject's release function. We
wouldn't have to hold up rmmod, and delayed execution of kobject_cleanup
wouldn't break anything, because a klp_patch would then have the same "lifespan"
as its corresponding kobject, and therefore it would be safe to invoke
enabled_store & co. up until kobject_cleanup is finally executed. We'd be
able to use container_of in this case as well. In addition, we wouldn't
have to force all sysfs functions to support an awkward edge-case (i.e.
checking if the corresponding klp_patch still exists). I think this
solution also matches better with the typical use-case of the kobject
release method, as described in kobject.txt (replacing 'my_object' with
klp_patch):
---
(...)
This notification is done through a kobject's release() method.
Usually such a method has a form like:
void my_object_release(struct kobject *kobj)
{
struct my_object *mine = container_of(kobj, struct my_object, kobj);
/* Perform any additional cleanup on this object, then... */
kfree(mine);
}
---
Apologies for the giant wall of text. In any case I feel like solution #2
is actually closer in line with how kobjects are normally used, embedded in
the structures they refer to, which get deallocated once their refcount
hits 0. What do people think?
Jessica
Powered by blists - more mailing lists