[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180626200745.GR2494@hirez.programming.kicks-ass.net>
Date: Tue, 26 Jun 2018 22:07:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: linux-kernel@...r.kernel.org, mingo@...nel.org,
jiangshanlai@...il.com, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
oleg@...hat.com, joel@...lfernandes.org
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to
race with CPU offline
On Tue, Jun 26, 2018 at 11:29:50AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 26, 2018 at 07:51:19PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 26, 2018 at 10:10:39AM -0700, Paul E. McKenney wrote:
> > > Without special fail-safe quiescent-state-propagation checks, grace-period
> > > hangs can result from the following scenario:
> > >
> > > 1. CPU 1 goes offline.
> > >
> > > 2. Because CPU 1 is the only CPU in the system blocking the current
> > > grace period, as soon as rcu_cleanup_dying_idle_cpu()'s call to
> > > rcu_report_qs_rnp() returns.
> > >
> > > 3. At this point, the leaf rcu_node structure's ->lock is no longer
> > > held: rcu_report_qs_rnp() has released it, as it must in order
> > > to awaken the RCU grace-period kthread.
> > >
> > > 4. At this point, that same leaf rcu_node structure's ->qsmaskinitnext
> > > field still records CPU 1 as being online. This is absolutely
> > > necessary because the scheduler uses RCU, and ->qsmaskinitnext
> >
> > Can you expand a bit on this, where does the scheduler care about the
> > online state of the CPU that's about to call into arch_cpu_idle_dead()?
>
> Because the CPU does a context switch between the time that the CPU gets
> marked offline from the viewpoint of cpu_offline() and the time that
> the CPU finally makes it to arch_cpu_idle_dead(). Plus reporting the
> quiescent state (rcu_report_qs_rnp()) can result in waking up RCU's
> grace-period kthread. During that context switch and that wakeup,
> the scheduler needs RCU to continue paying attention to the outgoing
> CPU, right?
What you say is right, but I'm confused to its relevance. Afaict 2 above is:
do_idle()
if (cpu_offline()) // true
cpuhp_report_idle_dead()
rcu_report_dead()
rcu_cleanup_dying_idle_cpu()
arch_cpu_idle_dead()
There is no scheduling between that and the slightly later call to
arch_cpu_idle_dead(), we're in the middle of the idle task, preemption
is firmly disabled.
AFAICT rcu_cleanup_dying_idle_cpu() can mark your CPU as offline, it's
about to die. Also, we have a comment in cpuhp_report_idle_dead() that
we can't use complete() because RCU just took our CPU out.
Powered by blists - more mailing lists