[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bf17538-8828-e580-90de-0aaa8e6b2a4d@codeaurora.org>
Date: Thu, 24 Sep 2020 10:30:07 +0530
From: Neeraj Upadhyay <neeraju@...eaurora.org>
To: paulmck@...nel.org
Cc: josh@...htriplett.org, rostedt@...dmis.org,
mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
joel@...lfernandes.org, rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2] rcu/tree: Correctly handle single cpu check in
rcu_blocking_is_gp
Hi Paul,
On 9/24/2020 2:33 AM, Paul E. McKenney wrote:
> On Wed, Sep 23, 2020 at 12:59:33PM +0530, Neeraj Upadhyay wrote:
>> Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
>> rcu_blocking_is_gp() checks (with preemption disabled), whether
>> there is only one cpu online. It uses num_online_cpus() to
>> decide whether only one cpu is online. If there is only single
>> cpu online, synchronize_rcu() is optimized to return without
>> doing all the work to wait for grace period. However, there are
>> few issues with the num_online_cpus() check used, for transition
>> of __num_online_cpus from 2 -> 1 for cpu down path and 1 -> 2
>> for cpu up path.
>
> Again, good catch!
>
> As usual, I could not resist editing the commit log and comments, so
> could you please look the following over to make sure that I did not
> mess anything up?
>
The commit log and comments look very good! Thanks!
Thanks
Neeraj
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 7af8c1c8d13c6c9c7013fbfef77f843dbc430c4b
> Author: Neeraj Upadhyay <neeraju@...eaurora.org>
> Date: Wed Sep 23 12:59:33 2020 +0530
>
> rcu: Fix single-CPU check in rcu_blocking_is_gp()
>
> Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses
> num_online_cpus() to determine whether there is only one CPU online. When
> there is only single CPU online, the simple fact that synchronize_rcu()
> could be legally called implies that a full grace period has elapsed.
> Therefore, in the single-CPU case, synchronize_rcu() simply returns
> immediately. Unfortunately, num_online_cpus() is unreliable while a
> CPU-hotplug operation is transitioning to or from single-CPU operation
> because:
>
> 1. num_online_cpus() uses atomic_read(&__num_online_cpus) to
> locklessly sample the number of online CPUs. The hotplug locks
> are not held, which means that an incoming CPU can concurrently
> update this count. This in turn means that an RCU read-side
> critical section on the incoming CPU might observe updates
> prior to the grace period, but also that this critical section
> might extend beyond the end of the optimized synchronize_rcu().
> This breaks RCU's fundamental guarantee.
>
> 2. In addition, num_online_cpus() does no ordering, thus providing
> another way that RCU's fundamental guarantee can be broken by
> the current code.
>
> 3. The most probable failure mode happens on outgoing CPUs.
> The outgoing CPU updates the count of online CPUs in the
> CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in
> and of itself due to preemption being disabled at the call
> to num_online_cpus(). Unfortunately, after that stop-machine
> handler returns, the CPU takes one last trip through the
> scheduler (which has RCU readers) and, after the resulting
> context switch, one final dive into the idle loop. During this
> time, RCU needs to keep track of two CPUs, but num_online_cpus()
> will say that there is only one, which in turn means that the
> surviving CPU will incorrectly ignore the outgoing CPU's RCU
> read-side critical sections.
>
> This problem is illustrated by the following litmus test in which P0()
> corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU.
> The herd7 tool confirms that the "exists" clause can be satisfied,
> thus demonstrating that this breakage can happen according to the Linux
> kernel memory model.
>
> {
> int x = 0;
> atomic_t numonline = ATOMIC_INIT(1);
> }
>
> P0(int *x, atomic_t *numonline)
> {
> int r0;
> WRITE_ONCE(*x, 1);
> r0 = atomic_read(numonline);
> if (r0 == 1) {
> smp_mb();
> } else {
> synchronize_rcu();
> }
> WRITE_ONCE(*x, 2);
> }
>
> P1(int *x, atomic_t *numonline)
> {
> int r0; int r1;
>
> atomic_inc(numonline);
> smp_mb();
> rcu_read_lock();
> r0 = READ_ONCE(*x);
> smp_rmb();
> r1 = READ_ONCE(*x);
> rcu_read_unlock();
> }
>
> locations [x;numonline;]
>
> exists (1:r0=0 /\ 1:r1=2)
>
> It is important to note that these problems arise only when the system
> is transitioning to or from single-CPU operation.
>
> One solution would be to hold the CPU-hotplug locks while sampling
> num_online_cpus(), which was in fact the intent of the (redundant)
> preempt_disable() and preempt_enable() surrounding this call to
> num_online_cpus(). Actually blocking CPU hotplug would not only result
> in excessive overhead, but would also unnecessarily impede CPU-hotplug
> operations.
>
> This commit therefore follows long-standing RCU tradition by maintaining
> a separate RCU-specific set of CPU-hotplug books.
>
> This separate set of books is implemented by a new ->n_online_cpus field
> in the rcu_state structure that maintains RCU's count of the online CPUs.
> This count is incremented early in the CPU-online process, so that
> the critical transition away from single-CPU operation will occur when
> there is only a single CPU. Similarly for the critical transition to
> single-CPU operation, the counter is decremented late in the CPU-offline
> process, again while there is only a single CPU. Because there is only
> ever a single CPU when the ->n_online_cpus field undergoes the critical
> 1->2 and 2->1 transitions, full memory ordering and mutual exclusion is
> provided implicitly and, better yet, for free.
>
> In the case where the CPU is coming online, nothing will happen until
> the current CPU helps it come online. Therefore, the new CPU will see
> all accesses prior to the optimized grace period, which means that RCU
> does not need to further delay this new CPU. In the case where the CPU
> is going offline, the outgoing CPU is totally out of the picture before
> the optimized grace period starts, which means that this outgoing CPU
> cannot see any of the accesses following that grace period. Again,
> RCU needs no further interaction with the outgoing CPU.
>
> This does mean that synchronize_rcu() will unnecessarily do a few grace
> periods the hard way just before the second CPU comes online and just
> after the second-to-last CPU goes offline, but it is not worth optimizing
> this uncommon case.
>
> Signed-off-by: Neeraj Upadhyay <neeraju@...eaurora.org>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6d9ec8e..9c56b63 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2402,6 +2402,7 @@ int rcutree_dead_cpu(unsigned int cpu)
> if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> return 0;
>
> + WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> /* Adjust any no-longer-needed kthreads. */
> rcu_boost_kthread_setaffinity(rnp, -1);
> /* Do any needed no-CB deferred wakeups from this CPU. */
> @@ -3604,7 +3605,20 @@ static int rcu_blocking_is_gp(void)
> return rcu_scheduler_active == RCU_SCHEDULER_INACTIVE;
> might_sleep(); /* Check for RCU read-side critical section. */
> preempt_disable();
> - ret = num_online_cpus() <= 1;
> + /*
> + * If the rcu_state.n_online_cpus counter is equal to one,
> + * there is only one CPU, and that CPU sees all prior accesses
> + * made by any CPU that was online at the time of its access.
> + * Furthermore, if this counter is equal to one, its value cannot
> + * change until after the preempt_enable() below.
> + *
> + * Furthermore, if rcu_state.n_online_cpus is equal to one here,
> + * all later CPUs (both this one and any that come online later
> + * on) are guaranteed to see all accesses prior to this point
> + * in the code, without the need for additional memory barriers.
> + * Those memory barriers are provided by CPU-hotplug code.
> + */
> + ret = READ_ONCE(rcu_state.n_online_cpus) <= 1;
> preempt_enable();
> return ret;
> }
> @@ -3649,7 +3663,7 @@ void synchronize_rcu(void)
> lock_is_held(&rcu_sched_lock_map),
> "Illegal synchronize_rcu() in RCU read-side critical section");
> if (rcu_blocking_is_gp())
> - return;
> + return; // Context allows vacuous grace periods.
> if (rcu_gp_is_expedited())
> synchronize_rcu_expedited();
> else
> @@ -3989,6 +4003,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> rcu_prepare_kthreads(cpu);
> rcu_spawn_cpu_nocb_kthread(cpu);
> + WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1);
>
> return 0;
> }
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e4f66b8..805c9eb 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -298,6 +298,7 @@ struct rcu_state {
> /* Hierarchy levels (+1 to */
> /* shut bogus gcc warning) */
> int ncpus; /* # CPUs seen so far. */
> + int n_online_cpus; /* # CPUs online for RCU. */
>
> /* The following fields are guarded by the root rcu_node's lock. */
>
>
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
Powered by blists - more mailing lists