linux-kernel - Re: [PATCH 3/3] livepatch: force transition process to finish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170530124118.5x2niqew4ygyuxlj@treble>
Date:   Tue, 30 May 2017 07:41:18 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Miroslav Benes <mbenes@...e.cz>, jeyu@...hat.com, jikos@...nel.org,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] livepatch: force transition process to finish

On Mon, May 29, 2017 at 02:28:13PM +0200, Petr Mladek wrote:
> On Fri 2017-05-26 12:37:56, Josh Poimboeuf wrote:
> > On Thu, May 25, 2017 at 06:03:07PM +0200, Petr Mladek wrote:
> > > On Thu 2017-05-25 14:59:55, Miroslav Benes wrote:
> > > > 
> > > > > > > In fact, I would suggest to take klp_mutex in force_store()
> > > > > > > and do all actions synchronously, including the check
> > > > > > > of klp_transition_patch.
> > > > > > 
> > > > > > I still think it is better not do it. klp_unmark_tasks() does nothing else 
> > > > > > than tasks already do. They call klp_update_patch_state() by themselves 
> > > > > > and they do not grab klp_mutex lock for doing that. klp_unmark_tasks() 
> > > > > > only forces this action.
> > > > > 
> > > > > You have a point. But I am not convinced ;-) klp_update_patch_state()
> > > > > was called very carefully only when it was safe. The forcing
> > > > > intentionally breaks the consistency model. User should really know
> > > > > what they are doing when they use this feature.
> > > > > 
> > > > > I think that we should actually taint the kernel. Developers should
> > > > > know when users were pulling their legs.
> > > > 
> > > > We could do that. I can change pr_warn() to WARN_ON_ONCE(), which would of 
> > > > course taint the kernel.
> > > 
> > > Sounds good to me.
> > 
> > I'm thinking that WARN_ON_ONCE() seems too severe.  If the patch didn't
> > need a consistency model in the first place then it wouldn't be worth
> > warning about.
> > 
> > We have to trust that the user knows what they're doing.  And that's
> > true for the entire live patching process, including patch analysis and
> > patch creation.  And anyway we already have a taint flag for that:
> > TAINT_LIVEPATCH.
> 
> But the force is done on the user side. Let's say that the authors of
> the livepatch code and of the patches know what they are doing.
> Could we expect the same from the admins that apply the patches?
> 
> TAINT_LIVEPATCH is set because the system behaves differently
> than with the original code. But it still should be consistent.
> Using the force migration might move the system to a wonder land.

True.  If the patch creators don't want the user to "use the force", the
WARNING would be appropriate.  Otherwise, if the patch creators *do*
want the user to force, the WARNING will be overkill and may alarm the
user.

When the user is root, we always have to trust them to a certain extent.
So I'd be more worried about dealing with the fallout of the false
WARNING.  But I don't feel strongly about it either way.

> > > > > > On the other hand, I do not see a problem in doing that. We already have a 
> > > > > > relationship between klp_mutex and tasklist_lock defined elsewhere, so it 
> > > > > > is safe.
> > > > > 
> > > > > Yup.
> > > > > 
> > > > > > It would only serialize things needlessly.
> > > > > 
> > > > > I do not agree. The speed is not important here. Also look
> > > > > into klp_reverse_transition(). We explicitly clear all
> > > > > TIF_PATCH_PENDING flags and call synchronize_rcu() just
> > > > > to make the situation easier and reduce space for potential
> > > > > mistakes.
> > > > 
> > > > Yes, because we had to do that. We ran into problems otherwise. We do not 
> > > > have to do it here. It does not help anything in my opinion.
> > > 
> > > AFAIK, we did not have to do it, see
> > > https://lkml.kernel.org/r/20161222143452.GK25166@pathway.suse.cz
> > > and the comment starting with "It would still leave a small".
> > > 
> > > Just for record, the idea of disabling the TIF flags came from Josh
> > > in another mail. I have just repeated it.
> > > 
> > > I think that the problem already is complex enough and the
> > > serialization would reduce the space of potential races.
> > > But it is possible that I see it just too complex here.
> > 
> > IMO we can skip the mutex.  The consistency model will be broken anyway,
> > so all bets are off.
> 
> I just hope that I will never be forced to debug a system crash
> after this operation.
> 
> Imagine a situation when we send a livepatch using the hybrid
> consistency model that should be safe also in the immediate mode.
> Some processes would get stacked. We suggest forcing because
> it should be safe. And it will break. Then we will want to know
> why this has happened. If the forcing is not serialized, we will
> need to consider/check much more parallel operations.
> 
> But if I am the only one who think this way, it might mean
> that I am over-pessimistic in this context. I will buy
> some head bandage to be prepared and could live without
> the serialization.

I don't feel strongly about this one either way either.  But we all
agree that it doesn't need the mutex now, and I can't think of a
scenario where the code would change such that it would need it.

-- 
Josh