linux-kernel - Re: [PATCH V2] rcu/tree: Correctly handle single cpu check in rcu_blocking_is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bf17538-8828-e580-90de-0aaa8e6b2a4d@codeaurora.org>
Date:   Thu, 24 Sep 2020 10:30:07 +0530
From:   Neeraj Upadhyay <neeraju@...eaurora.org>
To:     paulmck@...nel.org
Cc:     josh@...htriplett.org, rostedt@...dmis.org,
        mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
        joel@...lfernandes.org, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2] rcu/tree: Correctly handle single cpu check in
 rcu_blocking_is_gp

Hi Paul,

On 9/24/2020 2:33 AM, Paul E. McKenney wrote:
> On Wed, Sep 23, 2020 at 12:59:33PM +0530, Neeraj Upadhyay wrote:
>> Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
>> rcu_blocking_is_gp() checks (with preemption disabled), whether
>> there is only one cpu online. It uses num_online_cpus() to
>> decide whether only one cpu is online. If there is only single
>> cpu online, synchronize_rcu() is optimized to return without
>> doing all the work to wait for grace period. However, there are
>> few issues with the num_online_cpus() check used, for transition
>> of __num_online_cpus from 2 -> 1 for cpu down path and 1 -> 2
>> for cpu up path.
> 
> Again, good catch!
> 
> As usual, I could not resist editing the commit log and comments, so
> could you please look the following over to make sure that I did not
> mess anything up?
> 

The commit log and comments look very good! Thanks!


Thanks
Neeraj

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 7af8c1c8d13c6c9c7013fbfef77f843dbc430c4b
> Author: Neeraj Upadhyay <neeraju@...eaurora.org>
> Date:   Wed Sep 23 12:59:33 2020 +0530
> 
>      rcu: Fix single-CPU check in rcu_blocking_is_gp()
>      
>      Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses
>      num_online_cpus() to determine whether there is only one CPU online.  When
>      there is only single CPU online, the simple fact that synchronize_rcu()
>      could be legally called implies that a full grace period has elapsed.
>      Therefore, in the single-CPU case, synchronize_rcu() simply returns
>      immediately.  Unfortunately, num_online_cpus() is unreliable while a
>      CPU-hotplug operation is transitioning to or from single-CPU operation
>      because:
>      
>      1.      num_online_cpus() uses atomic_read(&__num_online_cpus) to
>              locklessly sample the number of online CPUs.  The hotplug locks
>              are not held, which means that an incoming CPU can concurrently
>              update this count.  This in turn means that an RCU read-side
>              critical section on the incoming CPU might observe updates
>              prior to the grace period, but also that this critical section
>              might extend beyond the end of the optimized synchronize_rcu().
>              This breaks RCU's fundamental guarantee.
>      
>      2.      In addition, num_online_cpus() does no ordering, thus providing
>              another way that RCU's fundamental guarantee can be broken by
>              the current code.
>      
>      3.      The most probable failure mode happens on outgoing CPUs.
>              The outgoing CPU updates the count of online CPUs in the
>              CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in
>              and of itself due to preemption being disabled at the call
>              to num_online_cpus().  Unfortunately, after that stop-machine
>              handler returns, the CPU takes one last trip through the
>              scheduler (which has RCU readers) and, after the resulting
>              context switch, one final dive into the idle loop.  During this
>              time, RCU needs to keep track of two CPUs, but num_online_cpus()
>              will say that there is only one, which in turn means that the
>              surviving CPU will incorrectly ignore the outgoing CPU's RCU
>              read-side critical sections.
>      
>      This problem is illustrated by the following litmus test in which P0()
>      corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU.
>      The herd7 tool confirms that the "exists" clause can be satisfied,
>      thus demonstrating that this breakage can happen according to the Linux
>      kernel memory model.
>      
>         {
>           int x = 0;
>           atomic_t numonline = ATOMIC_INIT(1);
>         }
>      
>         P0(int *x, atomic_t *numonline)
>         {
>           int r0;
>           WRITE_ONCE(*x, 1);
>           r0 = atomic_read(numonline);
>           if (r0 == 1) {
>             smp_mb();
>           } else {
>             synchronize_rcu();
>           }
>           WRITE_ONCE(*x, 2);
>         }
>      
>         P1(int *x, atomic_t *numonline)
>         {
>           int r0; int r1;
>      
>           atomic_inc(numonline);
>           smp_mb();
>           rcu_read_lock();
>           r0 = READ_ONCE(*x);
>           smp_rmb();
>           r1 = READ_ONCE(*x);
>           rcu_read_unlock();
>         }
>      
>         locations [x;numonline;]
>      
>         exists (1:r0=0 /\ 1:r1=2)
>      
>      It is important to note that these problems arise only when the system
>      is transitioning to or from single-CPU operation.
>      
>      One solution would be to hold the CPU-hotplug locks while sampling
>      num_online_cpus(), which was in fact the intent of the (redundant)
>      preempt_disable() and preempt_enable() surrounding this call to
>      num_online_cpus().  Actually blocking CPU hotplug would not only result
>      in excessive overhead, but would also unnecessarily impede CPU-hotplug
>      operations.
>      
>      This commit therefore follows long-standing RCU tradition by maintaining
>      a separate RCU-specific set of CPU-hotplug books.
>      
>      This separate set of books is implemented by a new ->n_online_cpus field
>      in the rcu_state structure that maintains RCU's count of the online CPUs.
>      This count is incremented early in the CPU-online process, so that
>      the critical transition away from single-CPU operation will occur when
>      there is only a single CPU.  Similarly for the critical transition to
>      single-CPU operation, the counter is decremented late in the CPU-offline
>      process, again while there is only a single CPU.  Because there is only
>      ever a single CPU when the ->n_online_cpus field undergoes the critical
>      1->2 and 2->1 transitions, full memory ordering and mutual exclusion is
>      provided implicitly and, better yet, for free.
>      
>      In the case where the CPU is coming online, nothing will happen until
>      the current CPU helps it come online.  Therefore, the new CPU will see
>      all accesses prior to the optimized grace period, which means that RCU
>      does not need to further delay this new CPU.  In the case where the CPU
>      is going offline, the outgoing CPU is totally out of the picture before
>      the optimized grace period starts, which means that this outgoing CPU
>      cannot see any of the accesses following that grace period.  Again,
>      RCU needs no further interaction with the outgoing CPU.
>      
>      This does mean that synchronize_rcu() will unnecessarily do a few grace
>      periods the hard way just before the second CPU comes online and just
>      after the second-to-last CPU goes offline, but it is not worth optimizing
>      this uncommon case.
>      
>      Signed-off-by: Neeraj Upadhyay <neeraju@...eaurora.org>
>      Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6d9ec8e..9c56b63 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2402,6 +2402,7 @@ int rcutree_dead_cpu(unsigned int cpu)
>   	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>   		return 0;
>   
> +	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
>   	/* Adjust any no-longer-needed kthreads. */
>   	rcu_boost_kthread_setaffinity(rnp, -1);
>   	/* Do any needed no-CB deferred wakeups from this CPU. */
> @@ -3604,7 +3605,20 @@ static int rcu_blocking_is_gp(void)
>   		return rcu_scheduler_active == RCU_SCHEDULER_INACTIVE;
>   	might_sleep();  /* Check for RCU read-side critical section. */
>   	preempt_disable();
> -	ret = num_online_cpus() <= 1;
> +	/*
> +	 * If the rcu_state.n_online_cpus counter is equal to one,
> +	 * there is only one CPU, and that CPU sees all prior accesses
> +	 * made by any CPU that was online at the time of its access.
> +	 * Furthermore, if this counter is equal to one, its value cannot
> +	 * change until after the preempt_enable() below.
> +	 *
> +	 * Furthermore, if rcu_state.n_online_cpus is equal to one here,
> +	 * all later CPUs (both this one and any that come online later
> +	 * on) are guaranteed to see all accesses prior to this point
> +	 * in the code, without the need for additional memory barriers.
> +	 * Those memory barriers are provided by CPU-hotplug code.
> +	 */
> +	ret = READ_ONCE(rcu_state.n_online_cpus) <= 1;
>   	preempt_enable();
>   	return ret;
>   }
> @@ -3649,7 +3663,7 @@ void synchronize_rcu(void)
>   			 lock_is_held(&rcu_sched_lock_map),
>   			 "Illegal synchronize_rcu() in RCU read-side critical section");
>   	if (rcu_blocking_is_gp())
> -		return;
> +		return;  // Context allows vacuous grace periods.
>   	if (rcu_gp_is_expedited())
>   		synchronize_rcu_expedited();
>   	else
> @@ -3989,6 +4003,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
>   	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   	rcu_prepare_kthreads(cpu);
>   	rcu_spawn_cpu_nocb_kthread(cpu);
> +	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1);
>   
>   	return 0;
>   }
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e4f66b8..805c9eb 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -298,6 +298,7 @@ struct rcu_state {
>   						/* Hierarchy levels (+1 to */
>   						/*  shut bogus gcc warning) */
>   	int ncpus;				/* # CPUs seen so far. */
> +	int n_online_cpus;			/* # CPUs online for RCU. */
>   
>   	/* The following fields are guarded by the root rcu_node's lock. */
>   
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation