linux-kernel - Re: [PATCH v2 03/11] sched,livepatch: Use task_call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YVw5qO0rLA/GduFm@alley>
Date:   Tue, 5 Oct 2021 13:40:24 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     gor@...ux.ibm.com, jpoimboe@...hat.com, jikos@...nel.org,
        mbenes@...e.cz, mingo@...nel.org, linux-kernel@...r.kernel.org,
        joe.lawrence@...hat.com, fweisbec@...il.com, tglx@...utronix.de,
        hca@...ux.ibm.com, svens@...ux.ibm.com, sumanthk@...ux.ibm.com,
        live-patching@...r.kernel.org, paulmck@...nel.org,
        rostedt@...dmis.org, x86@...nel.org
Subject: Re: [PATCH v2 03/11] sched,livepatch: Use task_call_func()

On Wed 2021-09-29 17:17:26, Peter Zijlstra wrote:
> Instead of frobbing around with scheduler internals, use the shiny new
> task_call_func() interface.
> 
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -274,6 +266,22 @@ static int klp_check_stack(struct task_s
>  	return 0;
>  }
>  
> +static int klp_check_and_switch_task(struct task_struct *task, void *arg)
> +{
> +	int ret;
> +
> +	if (task_curr(task))

This must be

	if (task_curr(task) && task != current)

, otherwise the task is not able to migrate itself. The condition was
lost when reshuffling the original code, see below.

JFYI, I have missed it during review. I am actually surprised that the
process could check its own stack reliably. But it seems to work.

I found the problem when "busy target module" selftest failed.
It was not able to livepatch the workqueue worker that was
proceeding klp_transition_work_fn().


> +		return -EBUSY;
> +
> +	ret = klp_check_stack(task, arg);
> +	if (ret)
> +		return ret;
> +
> +	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> +	task->patch_state = klp_target_state;
> +	return 0;
> +}
> +
>  /*
>   * Try to safely switch a task to the target patch state.  If it's currently
>   * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -305,36 +308,31 @@ static bool klp_try_switch_task(struct t
>  	 * functions.  If all goes well, switch the task to the target patch
>  	 * state.
>  	 */
> -	rq = task_rq_lock(task, &flags);
> +	ret = task_call_func(task, klp_check_and_switch_task, &old_name);

It looks correct. JFYI, this is why:

The logic seems to be exactly the same, except for the one fallout
mentioned above. So the only problem might be races.

The only important thing is that the task must not be running on any CPU
when klp_check_stack(task, arg) is called. By other word, the stack
must stay the same when being checked.

The original code prevented races by taking task_rq_lock().
And task_call_func() is slightly more relaxed but it looks safe enough:

  + it still takes rq lock when the task is in runnable state.
  + it always takes p->pi_lock that prevents moving the task
    into runnable state by try_to_wake_up().


> +	switch (ret) {
> +	case 0:		/* success */
> +		break;
>  
> -	if (task_running(rq, task) && task != current) {

This is the original code that checked (task != current).

> -		snprintf(err_buf, STACK_ERR_BUF_SIZE,
> -			 "%s: %s:%d is running\n", __func__, task->comm,
> -			 task->pid);
> -		goto done;

With the added (task != current) check:

Reviewed-by: Petr Mladek <pmladek@...e.com>
Tested-by: Petr Mladek <pmladek@...e.com>

Best Regards,
Petr