linux-kernel - Re: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 3 Mar 2022 17:49:14 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Zqiang <qiang1.zhang@...el.com>
Cc:     paulmck@...nel.org, linux-kernel@...r.kernel.org,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Uladzislau Rezki <uladzislau.rezki@...y.com>,
        Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog
 kthreads spawn failed

On Mon, Feb 28, 2022 at 05:36:29PM +0800, Zqiang wrote:
> When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop
> and rcuog kthreads is created. however the rcuop or rcuog kthreads
> creation may fail, if failed, clear rdp offloaded flags.
> 
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> ---
>  kernel/rcu/tree_nocb.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 46694e13398a..94b279147954 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  				"rcuog/%d", rdp_gp->cpu);
>  		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
>  			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> -			return;
> +			goto end;
>  		}
>  		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
>  		if (kthread_prio)
> @@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  	t = kthread_run(rcu_nocb_cb_kthread, rdp,
>  			"rcuo%c/%d", rcu_state.abbr, cpu);
>  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> -		return;
> +		goto end;
>  
>  	if (kthread_prio)
>  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
>  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> +	return;
> +end:
> +	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
> +		rcu_segcblist_offload(&rdp->cblist, false);
> +		rcu_segcblist_clear_flags(&rdp->cblist,
> +				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> +		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
> +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
> +	}

Thanks you, consequences are indeed bad otherwise because the target is considered
offloaded but nothing actually handles the callbacks.

A few issues though:

* The rdp_gp kthread may be running concurrently. If it's iterating this rdp and
  the SEGCBLIST_LOCKING flag is cleared in the middle, rcu_nocb_unlock() won't
  release (among many other possible issues).

* we should clear the cpu from rcu_nocb_mask or we won't be able to later
  re-offload it.

* we should then delete the rdp from the group list:

     list_del_rcu(&rdp->nocb_entry_rdp);

So ideally we should call rcu_nocb_rdp_deoffload(). But then bear in mind:

1) We must lock rcu_state.barrier_mutex and hotplug read lock. But since we
   are calling rcutree_prepare_cpu(), we maybe holding hotplug write lock
   already.

   Therefore we first need to invert the locking dependency order between
   rcu_state.barrier_mutex and hotplug lock and then just lock the barrier_mutex
   before calling rcu_nocb_rdp_deoffload() from our failure path.

2) On rcu_nocb_rdp_deoffload(), handle non-existing nocb_gp and/or nocb_cb
   kthreads. Make sure we are holding nocb_gp_kthread_mutex.

I'm going to take your patch and adapt it along those lines.

Thanks!