linux-kernel - Re: [PATCH v4 1/5] rcu/tree: Add a warning if CPU being onlined did not report QS already

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200810154654.GJ4295@paulmck-ThinkPad-P72>
Date:   Mon, 10 Aug 2020 08:46:54 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Jonathan Corbet <corbet@....net>,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linux-doc@...r.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        peterz@...radead.org, Randy Dunlap <rdunlap@...radead.org>,
        rcu@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
        tglx@...utronix.de, vineethrp@...il.com
Subject: Re: [PATCH v4 1/5] rcu/tree: Add a warning if CPU being onlined did
 not report QS already

On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> quiescent state from the incoming CPU.  However, the current interaction
> between RCU quiescent-state reporting and CPU-hotplug operations should
> mean that the incoming CPU never needs to report a quiescent state.
> First, the outgoing CPU reports a quiescent state if needed.  Second,
> the race where the CPU is leaving just as RCU is initializing a new
> grace period is handled by an explicit check for this condition.  Third,
> the CPU's leaf rcu_node structure's ->lock serializes these checks.
> 
> This means that if rcu_cpu_starting() ever feels the need to report
> a quiescent state, then there is a bug somewhere in the CPU hotplug
> code or the RCU grace-period handling code.  This commit therefore
> adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> 
> Cc: Paul E. McKenney <paulmck@...nel.org>
> Cc: Neeraj Upadhyay <neeraju@...eaurora.org>
> Suggested-by: Paul E. McKenney <paulmck@...nel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
>  kernel/rcu/tree.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 65e1b5e92319..a49fa3b60faa 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
>  	rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
>  	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
>  	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> -	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> +
> +	/*
> +	 * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> +	 * warning does not fire, consider replacing it with the "else" block,
> +	 * by June 2021 or so (while keeping the warning). Refer to RCU's
> +	 * Requirements documentation for the rationale.

Let's suppose that this change is made, and further that in a year or
two the "if" statement below is replaced with its "else" block.

Now let's suppose that (some years after that) a hard-to-trigger bug
makes its way into RCU's CPU-hotplug code that would have resulted in
the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
hard to trigger in certain large production environments.

Let's suppose further that you have moved on to where you are responsible
for one of these large production environments.  How would this
hypothetical RCU/CPU-hotplug bug manifest?

							Thanx, Paul

> +	 */
> +	if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
>  		rcu_disable_urgency_upon_qs(rdp);
>  		/* Report QS -after- changing ->qsmaskinitnext! */
>  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> -- 
> 2.28.0.236.gb10cc79966-goog
>