[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160506123855.oabdeehqu3aj6nqy@treble>
Date: Fri, 6 May 2016 07:38:55 -0500
From: Josh Poimboeuf <jpoimboe@...hat.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Jessica Yu <jeyu@...hat.com>, Jiri Kosina <jikos@...nel.org>,
Miroslav Benes <mbenes@...e.cz>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Michael Ellerman <mpe@...erman.id.au>,
Heiko Carstens <heiko.carstens@...ibm.com>,
live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
x86@...nel.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, Vojtech Pavlik <vojtech@...e.com>,
Jiri Slaby <jslaby@...e.cz>,
Chris J Arges <chris.j.arges@...onical.com>,
Andy Lutomirski <luto@...nel.org>
Subject: Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a
per-task consistency model
On Thu, May 05, 2016 at 01:57:01PM +0200, Petr Mladek wrote:
> I have missed that the two commands are called with preemption
> disabled. So, I had the following crazy scenario in mind:
>
>
> CPU0 CPU1
>
> klp_enable_patch()
>
> klp_target_state = KLP_PATCHED;
>
> for_each_task()
> set TIF_PENDING_PATCH
>
> # task 123
>
> if (klp_patch_pending(current)
> klp_patch_task(current)
>
> clear TIF_PENDING_PATCH
>
> smp_rmb();
>
> # switch to assembly of
> # klp_patch_task()
>
> mov klp_target_state, %r12
>
> # interrupt and schedule
> # another task
>
>
> klp_reverse_transition();
>
> klp_target_state = KLP_UNPATCHED;
>
> klt_try_to_complete_transition()
>
> task = 123;
> if (task->patch_state == klp_target_state;
> return 0;
>
> => task 123 is in target state and does
> not block conversion
>
> klp_complete_transition()
>
>
> # disable previous patch on the stack
> klp_disable_patch();
>
> klp_target_state = KLP_UNPATCHED;
>
>
> # task 123 gets scheduled again
> lea %r12, task->patch_state
>
> => it happily stores an outdated
> state
>
Thanks for the clear explanation, this helps a lot.
> This is why the two functions should get called with preemption
> disabled. We should document it at least. I imagine that we will
> use them later also in another context and nobody will remember
> this crazy scenario.
>
> Well, even disabled preemption does not help. The process on
> CPU1 might be also interrupted by an NMI and do some long
> printk in it.
>
> IMHO, the only safe approach is to call klp_patch_task()
> only for "current" on a safe place. Then this race is harmless.
> The switch happen on a safe place, so that it does not matter
> into which state the process is switched.
I'm not sure about this solution. When klp_complete_transition() is
called, we need all tasks to be patched, for good. We don't want any of
them to randomly switch to the wrong state at some later time in the
middle of a future patch operation. How would changing klp_patch_task()
to only use "current" prevent that?
> By other words, the task state might be updated only
>
> + by the task itself on a safe place
> + by other task when the updated on is sleeping on a safe place
>
> This should be well documented and the API should help to avoid
> a misuse.
I think we could fix it to be safe for future callers who might not have
preemption disabled with a couple of changes to klp_patch_task():
disabling preemption and testing/clearing the TIF_PATCH_PENDING flag
before changing the patch state:
void klp_patch_task(struct task_struct *task)
{
preempt_disable();
if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
task->patch_state = READ_ONCE(klp_target_state);
preempt_enable();
}
We would also need a synchronize_sched() after the patching is complete,
either at the end of klp_try_complete_transition() or in
klp_complete_transition(). That would make sure that all existing calls
to klp_patch_task() are done.
--
Josh
Powered by blists - more mailing lists