linux-kernel - Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180626182950.GH3593@linux.vnet.ibm.com>
Date:   Tue, 26 Jun 2018 11:29:50 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com, joel@...lfernandes.org
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to
 race with CPU offline

On Tue, Jun 26, 2018 at 07:51:19PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 26, 2018 at 10:10:39AM -0700, Paul E. McKenney wrote:
> > Without special fail-safe quiescent-state-propagation checks, grace-period
> > hangs can result from the following scenario:
> > 
> > 1.	CPU 1 goes offline.
> > 
> > 2.	Because CPU 1 is the only CPU in the system blocking the current
> > 	grace period, as soon as rcu_cleanup_dying_idle_cpu()'s call to
> > 	rcu_report_qs_rnp() returns.
> > 
> > 3.	At this point, the leaf rcu_node structure's ->lock is no longer
> > 	held: rcu_report_qs_rnp() has released it, as it must in order
> > 	to awaken the RCU grace-period kthread.
> > 
> > 4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
> > 	field still records CPU 1 as being online.  This is absolutely
> > 	necessary because the scheduler uses RCU, and ->qsmaskinitnext
> 
> Can you expand a bit on this, where does the scheduler care about the
> online state of the CPU that's about to call into arch_cpu_idle_dead()?

Because the CPU does a context switch between the time that the CPU gets
marked offline from the viewpoint of cpu_offline() and the time that
the CPU finally makes it to arch_cpu_idle_dead().  Plus reporting the
quiescent state (rcu_report_qs_rnp()) can result in waking up RCU's
grace-period kthread.  During that context switch and that wakeup,
the scheduler needs RCU to continue paying attention to the outgoing
CPU, right?

> > 	contains RCU's idea as to which CPUs are online.  Therefore,
> > 	invoking rcu_report_qs_rnp() after clearing CPU 1's bit from
> > 	->qsmaskinitnext would result in a lockdep-RCU splat due to
> > 	RCU being used from an offline CPU.
> > 
> > 5.	RCU's grace-period kthread awakens, sees that the old grace period
> > 	has completed and that a new one is needed.  It therefore starts
> > 	a new grace period, but because CPU 1's leaf rcu_node structure's
> > 	->qsmaskinitnext field still shows CPU 1 as being online, this new
> > 	grace period is initialized to wait for a quiescent state from the
> > 	now-offline CPU 1.
> 
> If we're past cpuhp_report_idle_cpu() -> rcu_report_dead(), then
> cpu_offline() is true. Is that not sufficient state to avoid this?

Not from what I can see.  To avoid this, I need to synchronize
with rcu_gp_init(), but I cannot rely on the usual rcu_node ->lock
synchronization without severely complicating quiescent-state reporting.
For one thing, quiescent-state reporting can require waking up the
grace-period kthread, which cannot be done while holding any rcu_node
->lock due to deadlock.  I -could- defer the wakeup (as is done in
several other places), but adding the separate lock is much simpler,
and given that both grace-period initialization and CPU hotplug are
relatively rare operations, the extra overhead is way down in the noise.

Or am I missing a trick here?

							Thanx, Paul

> > 6.	Without the fail-safe force-quiescent-state checks, there would
> > 	be no quiescent state from the now-offline CPU 1, which would
> > 	eventually result in RCU CPU stall warnings and memory exhaustion.
>