linux-kernel - Re: [RFC][PATCH 7/7] livepatch,context_tracking: Avoid disturbing NOHZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YUx9yNfgm4nnd23y@alley>
Date:   Thu, 23 Sep 2021 15:14:48 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     gor@...ux.ibm.com, jpoimboe@...hat.com, jikos@...nel.org,
        mbenes@...e.cz, mingo@...nel.org, linux-kernel@...r.kernel.org,
        joe.lawrence@...hat.com, fweisbec@...il.com, tglx@...utronix.de,
        hca@...ux.ibm.com, svens@...ux.ibm.com, sumanthk@...ux.ibm.com,
        live-patching@...r.kernel.org, paulmck@...nel.org
Subject: Re: [RFC][PATCH 7/7] livepatch,context_tracking: Avoid disturbing
 NOHZ_FULL tasks

On Wed 2021-09-22 13:05:13, Peter Zijlstra wrote:
> When a task is stuck in NOHZ_FULL usermode, we can simply mark the
> livepatch state complete.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
>  kernel/livepatch/transition.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -270,13 +270,24 @@ static int klp_check_task(struct task_st
>  {
>  	int ret;
>  
> -	if (task_curr(task))
> +	if (task_curr(task)) {
> +		if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER) {
> +			/*
> +			 * If we observe the CPU being in USER context, they
> +			 * must issue an smp_mb() before doing much kernel
> +			 * space and as such will observe the patched state,
> +			 * mark it clean.
> +			 */
> +			goto complete;

IMHO, this is not safe:

CPU0				CPU1

klp_check_task(A)
  if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER)
     goto complete;

  clear_tsk_thread_flag(task, TIF_PATCH_PENDING);

				# task switching to kernel space
				klp_update_patch_state(A)
				       if (test_and_clear_tsk_thread_flag(task,	TIF_PATCH_PENDING))
				       //false

				# calling kernel code with old task->patch_state

	task->patch_state = klp_target_state;

BANG: CPU0 sets task->patch_state when task A is already running
	kernel code on CPU1.

> +		}
>  		return -EBUSY;
> +	}
>  
>  	ret = klp_check_stack(task, arg);
>  	if (ret)
>  		return ret;
>  
> +complete:
>  	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
>  	task->patch_state = klp_target_state;

A solution might be to switch ordering and add a barrier here.

>  	return 0;


The code might look like:

static int klp_check_task(struct task_struct *task, void *arg)
{
	int ret;

	if (task_curr(task)) {
		if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER) {
			/*
			 * Task running in USER mode might get switched
			 * immediately. They are switched when entering
			 * kernel code anyway.
			 */
			goto complete;
		}
		return -EBUSY;
	}

	ret = klp_check_stack(task, arg);
	if (ret)
		return ret;

complete:
	WRITE_ONCE(task->patch_state, klp_target_state);
	/*
	 * We switch also tasks running in USER mode here. They must
	 * see the new state before clearing the pending flag.
	 * Otherwise, they might enter kernel mode without switching
	 * the state in klp_update_patch_state().
	 */
	smp_wmb();
	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);

	return 0;
}

The only problem is that the corresponding read barrier is not clear.
It will make more sense if it is paired with some read barrier
in the scheduler after handling TIF flags.

But we should be on the safe side because klp_ftrace_handler() always
does read barrier before reading the state. Though, it is done
there from other reasons.

Best Regards,
Petr