linux-kernel - Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180911160532.GJ4225@linux.vnet.ibm.com>
Date:   Tue, 11 Sep 2018 09:05:32 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        tglx@...utronix.de, Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH] rcu: Use cpus_read_lock() while looking at
 cpu_online_mask

On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> It was possible that sync_rcu_exp_select_cpus() enqueued something on
> CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> until CPU0 gets back online. This problem was addressed in commit
> fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> don't think the issue fully addressed.
> 
> Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> remains online between looking at cpu_online_mask and invoking
> queue_work_on() on CPU1.
> 
> Use cpus_read_lock() to ensure that `cpu' is not going down between
> looking at cpu_online_mask at invoking queue_work_on() and waiting for
> its completion. It is added around the loop + flush_work() which is
> similar to work_on_cpu_safe() (and we can have multiple jobs running on
> NUMA systems).

Is this experimental or theoretical?  If theoretical, the counter-theory
is that the stop-machine processing prevents any of the cpu_online_mask
bits from changing, though, yes, we would like to get rid of the
stop-machine processing.  So either way, yes, the current state could
use some improvement.

But one problem with the patch below is that sync_rcu_exp_select_cpus()
can be called while the cpu_hotplug_lock is write-held.  Or is that
somehow OK these days?  Assuming not, how about the (untested) patch
below?

							Thanx, Paul

> Fixes: fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
> 		      offline")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
>  kernel/rcu/tree_exp.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 01b6ddeb4f050..a104cf91e6b90 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -479,6 +479,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  	sync_exp_reset_tree(rsp);
>  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("select"));
> 
> +	cpus_read_lock();
>  	/* Schedule work for each leaf rcu_node structure. */
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		rnp->exp_need_flush = false;
> @@ -493,13 +494,11 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  			continue;
>  		}
>  		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> -		preempt_disable();
>  		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
>  		/* If all offline, queue the work on an unbound CPU. */
>  		if (unlikely(cpu > rnp->grphi))
>  			cpu = WORK_CPU_UNBOUND;
>  		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> -		preempt_enable();
>  		rnp->exp_need_flush = true;
>  	}
> 
> @@ -507,6 +506,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  	rcu_for_each_leaf_node(rsp, rnp)
>  		if (rnp->exp_need_flush)
>  			flush_work(&rnp->rew.rew_work);
> +	cpus_read_unlock();
>  }
> 
>  static void synchronize_sched_expedited_wait(struct rcu_state *rsp)

commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Tue Sep 11 08:57:48 2018 -0700

    rcu: Stop expedited grace periods from relying on stop-machine
    
    The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
    to prevent the cpu_online_mask from changing.  However, this relies on
    the stop-machine mechanism in the CPU-hotplug offline code, which is not
    desirable (it would be good to someday remove the stop-machine mechanism).
    
    This commit therefore instead uses the relevant leaf rcu_node structure's
    ->ffmask, which has a bit set for all CPUs that are fully functional.
    A given CPU's bit is cleared very early during offline processing by
    rcutree_offline_cpu() and set very late during online processsing by
    rcutree_online_cpu().  Therefore, if a CPU's bit is set in this mask, and
    preemption is disabled, we have to be before the synchronize_sched() in
    the CPU-hotplug offline code, which means that the CPU is guaranteed to be
    workqueue-ready throughout the duration of the enclosing preempt_disable()
    region of code.
    
    This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for
    this leaf rcu_node structure are offline, which is an acceptable difference
    in behavior.
    
    Reported-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8d18c1014e2b..e669ccf3751b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -450,10 +450,12 @@ static void sync_rcu_exp_select_cpus(smp_call_func_t func)
 		}
 		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
 		preempt_disable();
-		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
+		cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
 		/* If all offline, queue the work on an unbound CPU. */
-		if (unlikely(cpu > rnp->grphi))
+		if (unlikely(cpu > rnp->grphi - rnp->grplo))
 			cpu = WORK_CPU_UNBOUND;
+		else
+			cpu += rnp->grplo;
 		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
 		preempt_enable();
 		rnp->exp_need_flush = true;