linux-kernel - Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1604151107100.4334@pobox.suse.cz>
Date:	Fri, 15 Apr 2016 11:17:55 +0200 (CEST)
From:	Miroslav Benes <mbenes@...e.cz>
To:	Josh Poimboeuf <jpoimboe@...hat.com>
cc:	Jiri Kosina <jikos@...nel.org>, Jessica Yu <jeyu@...hat.com>,
	linux-kernel@...r.kernel.org, live-patching@...r.kernel.org,
	Vojtech Pavlik <vojtech@...e.com>
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency
 model

On Thu, 14 Apr 2016, Josh Poimboeuf wrote:

> On Thu, Apr 14, 2016 at 11:25:18AM +0200, Miroslav Benes wrote:
> > On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
> > 
> > > +/*
> > > + * klp_update_task_universe() - change the patched state of a task
> > > + * @task:	The task to change
> > > + *
> > > + * Converts the patched state of the task so that it will switch to the set of
> > > + * functions in the goal universe.
> > > + */
> > > +static inline void klp_update_task_universe(struct task_struct *task)
> > > +{
> > > +	/*
> > > +	 * The corresponding write barriers are in klp_init_transition() and
> > > +	 * klp_start_transition().  See the comments there for an explanation.
> > > +	 */
> > > +	smp_rmb();
> > > +
> > > +	task->klp_universe = klp_universe_goal;
> > > +}
> > 
> > I wonder whether we should introduce a static_key for this function. 
> > klp_update_task_universe() is called in some important code paths and it 
> > is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for 
> > syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE 
> > is set and that is set iff patching is in progress. But we call the 
> > function also elsewhere. So maybe it would be nice to introduce static_key 
> > in the function which would be on if the patching is in progress and off 
> > if not.
> > 
> > Maybe it is just an overoptimization though.
> 
> klp_update_task_universe() is called from:
> 
> - exit_to_usermode_loop() (slow path for syscall/irq exit)
> - copy_process() (though I'll be changing this code so children inherit
>   parents' flags)
> - init_idle()
> - cpu_idle_loop()
> - various places in livepatch code
> 
> I think none of those are fast paths, so my feeling is that a static key
> would be overkill.

Agreed.

> > > +static bool klp_try_switch_task(struct task_struct *task)
> > > +{
> > > +	struct rq *rq;
> > > +	unsigned long flags;
> > > +	int ret;
> > > +	bool success = false;
> > > +
> > > +	/* check if this task has already switched over */
> > > +	if (task->klp_universe == klp_universe_goal)
> > > +		return true;
> > 
> > [...]
> > 
> > > +/*
> > > + * This function can be called in the middle of an existing transition to
> > > + * reverse the direction of the universe goal.  This can be done to effectively
> > > + * cancel an existing enable or disable operation if there are any tasks which
> > > + * are stuck in the starting universe.
> > > + */
> > > +void klp_reverse_transition(void)
> > > +{
> > > +	struct klp_patch *patch = klp_transition_patch;
> > > +
> > > +	klp_start_transition(!klp_universe_goal);
> > > +	klp_try_complete_transition();
> > > +
> > > +	patch->enabled = !patch->enabled;
> > > +}
> > 
> > This function could be called iff the patching is in progress, there are 
> > some not-yet-migrated tasks and we wait for scheduled workqueue to run 
> > klp_try_complete_transition() again, right? I have two questions.
> > 
> > 1. Is the call to klp_try_complete_transition() here necessary? It would 
> > be called via workqueue, wouldn't it? I suspect we don't gain much with 
> > this and we introduce possible future problems because of "double 
> > success" of klp_try_complete_transition() (nothing serious there as of 
> > now though). But I might be wrong because I got really confused by this 
> > piece of code and its context :)
> 
> Yeah, the call to klp_try_complete_transition() isn't necessary because
> of the workqueue.  It's still kind of nice to have though, because it
> tries to transition immediately rather than waiting for the work (and we
> might end up deciding to change the delayed work frequency to be less
> often than once per second).

Yes, there are potential benefits with lower frequency.

> I can't really envision that future bugs related to "double success"
> would be much to worry about.  The work function does check for
> klp_transition_patch beforehand, and if it didn't,
> klp_complete_transition() would die loudly with a NULL pointer
> dereference.  Of course, it's always hard to anticipate future code
> changes and I could be completely wrong.

Ah, right. There's a check and we clear klp_transition_patch in 
klp_try_complete_transition() (I always forget there is a call to 
klp_complete_transition "hidden" there :)).

> So I would vote to keep this call, but I don't feel too strongly about
> it since it isn't strictly needed.

Yeah, let's leave it as it is because there is no real harm.

> > 2. klp_reverse_transition() works well because not-yet-migrated tasks are 
> > still in KLP_UNIVERSE_OLD (without loss of generality) and since 
> > klp_universe_goal is swapped they are in fact in their target universe. 
> > klp_try_switch_task() returns immediately in such case. The rest is 
> > migrated in an ordinary way.
> > 
> > What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but 
> > it is easier to discuss it here)? klp_start_transition() resets the flag 
> > for all the task. Shouldn't the flag be cleared for the task in 
> > klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact 
> > it does not matter if it is set. The patching success is determined by 
> > klp_func->klp_universe booleans. But those tasks would needlessly go 
> > through syscall slowpaths (see note above about static_key) which could 
> > hurt performance.
> > 
> > Does this make sense?
> 
> I agree with you, but for a different reason :-)
> 
> And actually there's another way this could happen: if a task is forked
> during klp_start_transition() after setting the universe goal but before
> setting the thread flags, it would already be transitioned but its flag
> would get unnecessarily set.  (Though as I mentioned above, the
> copy_process() code will be changed anyway, so that should no longer be
> an issue).
> 
> But I don't think it's really a performance issue, because:
> 
> 1) Assuming a reliable stack, in most cases the majority of tasks are
>    transitioned immediately.  So when reversing the transition, there
>    would be only a few tasks that would get the flag set unnecessarily.

Correct. Majority of tasks would get the flag cleared before they even get 
a change to go through the syscall gate.

> 2) For those tasks which do have the flag set unnecessarily, only the
>    next syscall will do the slow path.  After that the flag gets
>    cleared.  So it only slows down a single syscall for each task.  At a
>    macro level, that's just a drop in the bucket for performance and,
>    IMO, not worth worrying about and adding (potentially buggy) code
>    for.

Makes sense.

> However, I can think of another reason setting the flag unnecessarily
> might be bad: it's just conceptually confusing and could maybe lead to
> bugs.  (ok, here we go trying to predict the future again!)
> 
> Specifically, after klp_complete_transition(), a reasonable livepatch
> developer (is that an oxymoron? ;-)) might assume that all tasks'
> TIF_KLP_NEED_UPDATE flags are cleared and might write code which makes
> that assumption.
> 
> And, even with today's code it's kind of hard to convince myself that
> having a spurious TIF_KLP_NEED_UPDATE flag set from a previous patching
> operation won't cause issues on the next patch.
> 
> So maybe we should do as you suggest and clear the flag in
> klp_try_switch_task() if the task is already in the goal universe.
> 
> Or if we wanted to be paranoid we could just clear all the flags in
> klp_complete_transition(), but maybe that's a little too sloppy/lazy?

I'll leave it up to you, because I am not able to decide that. Both 
options seem good to me. The former is somewhat more appealing, the latter 
is a safe bet.

Thanks,
Miroslav