lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200730162159.GZ9247@paulmck-ThinkPad-P72>
Date:   Thu, 30 Jul 2020 09:21:59 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        rcu@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 1/2] rcu/tree: Add a warning if CPU being onlined did not
 report QS already

On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote:
> Add a warning if CPU being onlined did not report QS already. This is to
> simplify the code in the CPU onlining path and also to make clear about
> where QS is reported. The act of QS reporting in CPU onlining path is
> is likely unnecessary as shown by code reading and testing with
> rcutorture's TREE03 and hotplug parameters.

How about something like this for the commit log?

------------------------------------------------------------------------

Currently, rcu_cpu_starting() checks to see if the RCU core expects a
quiescent state from the incoming CPU.  However, the current interaction
between RCU quiescent-state reporting and CPU-hotplug operations should
mean that the incoming CPU never needs to report a quiescent state.
First, the outgoing CPU reports a quiescent state if needed.  Second,
the race where the CPU is leaving just as RCU is initializing a new
grace period is handled by an explicit check for this condition.  Third,
the CPU's leaf rcu_node structure's ->lock serializes these checks.

This means that if rcu_cpu_starting() ever feels the need to report
a quiescent state, then there is a bug somewhere in the CPU hotplug
code or the RCU grace-period handling code.  This commit therefore
adds a WARN_ON_ONCE() to bring that bug to everyone's attention.

------------------------------------------------------------------------

> Cc: Paul E. McKenney <paulmck@...nel.org>
> Cc: Neeraj Upadhyay <neeraju@...eaurora.org>
> Suggested-by: Paul E. McKenney <paulmck@...nel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> 
> ---
>  kernel/rcu/tree.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 65e1b5e92319..1e51962b565b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
>  	rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
>  	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
>  	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> -	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> +
> +	/*
> +	 * Delete QS reporting from here, by June 2021, if warning does not
> +	 * fire. Let us make the rules for reporting QS for an offline CPUs
> +	 * more explicit. The CPU onlining path does not need to report QS for
> +	 * an offline CPU. Either the QS should have reported during CPU
> +	 * offlining, or during rcu_gp_init() if it detected a race with either
> +	 * CPU offlining or task unblocking on previously offlined CPUs. Note
> +	 * that the FQS loop also does not report QS for an offline CPU any
> +	 * longer (unless it splats due to an offline CPU blocking the GP for
> +	 * too long).
> +	 */

Let's leave at least the WARN_ON_ONCE() indefinitely.  If you don't
believe me, remove this code in your local tree, have someone give you
several branches, some with bugs injected, and then try to figure out
which have the bugs and then try to find those bugs.

This is not a fastpath, so the overhead of the check is not a concern.
Believe me, the difficulty of bug location without this check is a very
real concern!  ;-)

On the other hand, I fully agree with the benefits of documenting the
design rules.  But is this really the best place to do that from the
viewpoint of someone who is trying to figure out how RCU works?

							Thanx, Paul

> +	if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
>  		rcu_disable_urgency_upon_qs(rdp);
>  		/* Report QS -after- changing ->qsmaskinitnext! */
>  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> -- 
> 2.28.0.rc0.142.g3c755180ce-goog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ