linux-kernel - Re: [RFC PATCH] livepatch: allow removal of a disabled patch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160504161423.pvupipravfxuyktz@treble>
Date:	Wed, 4 May 2016 11:14:23 -0500
From:	Josh Poimboeuf <jpoimboe@...hat.com>
To:	Miroslav Benes <mbenes@...e.cz>
Cc:	Jiri Kosina <jikos@...nel.org>, jeyu@...hat.com, pmladek@...e.com,
	jslaby@...e.cz, live-patching@...r.kernel.org,
	linux-kernel@...r.kernel.org, huawei.libin@...wei.com,
	minfei.huang@...oo.com
Subject: Re: [RFC PATCH] livepatch: allow removal of a disabled patch

On Wed, May 04, 2016 at 04:35:28PM +0200, Miroslav Benes wrote:
> On Wed, 4 May 2016, Josh Poimboeuf wrote:
> 
> > On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > 
> > > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > > > 
> > > > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > > > active protection.
> > > > > > > 
> > > > > > > What do you mean by "kernfs active protection"?  I see that
> > > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > > > that a write to a sysfs file uses that lock.
> > > > > > > 
> > > > > > > I'm probably missing something...
> > > > > > 
> > > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > > > 
> > > > > > More specifically, execution of store() and show() sysfs callbacks is 
> > > > > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > > > > reference held for reading (and that makes it impossible for that 
> > > > > > attribute to vanish prematurely).
> > > > > 
> > > > > Thanks, that makes sense.
> > > > > 
> > > > > So what exactly is the problem the completion is trying to solve?  Is it
> > > > > to ensure that the kobject has been cleaned up before it returns to the
> > > > > caller, in case the user wants to call klp_register() again after
> > > > > unregistering?
> > > > > 
> > > > > If so, that's quite an unusual use case which I think we should just
> > > > > consider unsupported.  In fact, if you try to do it, kobject_init()
> > > > > complains loudly because kobj->state_initialized is still 1 because
> > > > > kobjects aren't meant to be reused like that.
> > > > 
> > > > ... and now I realize the point is actually to prevent the caller from
> > > > freeing klp_patch before kobject_cleanup() runs.
> > > 
> > > Exactly. Sorry I was so brief.
> > > 
> > > > So yeah, it looks like we need the completion in case
> > > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > > > 
> > > > Or alternatively we could convert patch->kobj to be dynamically
> > > > allocated instead of embedded in klp_patch.
> > > 
> > > But that wouldn't help, would it? Dynamic kobjects registers generic 
> > > release function dynamic_kobj_release() and that's it. We're in the same 
> > > situation. I have got a feeling that dynamic kobjects are only for trivial 
> > > cases.
> > 
> > But the patch release doesn't need to do anything, right?
> 
> That is correct. I wanted to point out that dynamic_kobj_release() did not 
> really solve our "completion" issue. If there is a problem in our code and 
> we need completion, dynamic kobjects would not help. If we don't need a 
> completion at all we could move to dynamic kobjects.

Why would we need a completion if we had a dynamic kobject?  The
lifetime of the kobject no longer matters if it's separated from the
klp_patch struct.  The user can go ahead and free the klp_patch before
the kobject gets freed, no problem.

> There is still container_of() though.

Yes, but we could still find the corresponding klp_patch by searching
through the klp_patches list.  It's not ideal, but IMO it would be
better than needing the completion to account for our "special"
situation.

> > > Moreover we use container_of() several times in the code and that does not
> > > work with dynamically allocated kobjects.
> > > 
> > > Anyway I am really confused now. When I read changelog of c817a67ecba7 
> > > ("kobject: delayed kobject release: help find buggy drivers") all makes 
> > > perfect sense. But isn't our situation somewhat special, because we have 
> > > refcounts completely under control? So we know that once we call 
> > > kobject_put() we can let a patch go... I must be missing something.
> > > 
> > > It does not make sense to introduce completion just to satisfy a feature 
> > > which was introduced to debug general cases.
> > 
> > I think our situation is "special" because klp_patch and its embedded
> > kobject have separate lifetimes.  We have a kobject, which we own, which
> > is embedded in a klp_patch struct, which we don't own.
> > 
> > If I understand correctly, normally when you release a kobject, the
> > containing struct gets freed.  But we can't do that here because the
> > caller allocated the klp_patch.
> 
> Normally only struct kobject is freed, no? 
> 
> From Documentation/kobject.txt:
> 
> "One important point cannot be overstated: every kobject must have a
> release() method, and the kobject must persist (in a consistent state)
> until that method is called. If these constraints are not met, the code is
> flawed."
> 
> So we need to only make sure that klp_patch does not disappear before 
> calling release() method. In our case that cannot happen even without 
> completion, because we call kobject_put() in klp_unregister_patch() when 
> the patch module is going and kobject_put() calls our release (potentially 
> empty) method in a "sync" way. If I read that code correctly.
> 
> This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y.

Hm, maybe I'm missing your point, or maybe you're missing mine.  Maybe
both :-)

I understand why your patch has a completion.  And I understand that
it's needed because of CONFIG_DEBUG_KOBJECT_RELEASE.  But even then I
think it's only needed *if* the kobject is embedded in the klp_patch
struct.

In my experience, usually the point of having a kobject release function
is so you can kfree the kobject's containing struct.  That doesn't
conflict with your quote from kobject.txt.

But in our "special" code, klp_patch (and thus the kobject itself) is
allocated by the caller.  But its embedded kobject is initialized by
livepatch code.  In my experience, it's highly unusual to have the
kobject allocated by one party and initialized by another.

That's why I think it would be much more logical to turn patch->obj into
a pointer.  Then we can allocate it *and* initialize it, so that we can
100% control its lifetime and not have to worry about the low-level
details of how kobject_del() works, completions,
CONFIG_DEBUG_KOBJECT_RELEASE, etc.  And we would be more in line with
the principles of kobject, as I understand them ;-)

-- 
Josh