[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200611170335.GC4455@paulmck-ThinkPad-P72>
Date: Thu, 11 Jun 2020 10:03:35 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Josh Triplett <josh@...htriplett.org>
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
On Wed, Jun 10, 2020 at 09:32:03PM -0400, Joel Fernandes wrote:
> On Thu, Jun 04, 2020 at 03:10:30PM +0200, Frederic Weisbecker wrote:
> > On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> > >
> > > > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > > > held to avoid races between local callbacks handling and kthread
> > > > > offloaded callbacks handling.
> > > > > nocb_cb kthread is first parked to avoid any future race with
> > > > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > > > so that the nocb_gp kthread ignores this rdp.
> > > >
> > > > nit: you mean cblist is set to non-offloaded mode right?
> > > >
> > > > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > > > there's some issue if the cblist has both offloaded and non-offloaded
> > > > callbacks, but it would be good to clarify this here better IMHO.
> > >
> > > And for archival purposes: rcu_barrier needs excluding here because it is
> > > possible that for a brief period of time, the callback kthread has been
> > > parked to do the mode-switch, and it could be executing a bunch of callbacks
> > > when it was asked to park.
> > >
> > > Meanwhile, more interrupts happen and more callbacks are queued which are now
> > > executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> > > needs.
> >
> > I think in that case the callbacks would still be executed in order. We wait
> > for the kthread to park before switching to softirq callback execution.
>
> Ah ok, you are parking the CB kthread after the no-cb CB's are already
> invoked (that's when parkme() is called -- i.e. after rcu_do_batch() in the
> CB kthread runs).
>
> Yeah, I don't see the purpose of acquiring rcu_barrier mutex either now. Once
> you park, all CBs should have been invoked by the nocb CB thread right?
> kthread_park() waits for the thread to be parked before proceeding. And you
> don't de-offload before it is parked.
We absolutely must execute callbacks out of order in order to avoid
OOM due to RCU callback floods. This is because if we don't execute
callbacks out of order, there will be a time when we are not executing
callbacks at all. If execution gets preempted at this point, it is
quite possibly game over due to OOM.
Thanx, Paul
> > Initially it was to avoid callback ordering issues but I don't recall
> > exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
> > for the next version where, for a brief period of time, nocb kthread will
> > be able to compete with callback execution in softirq.
>
> Which nocb kthread is competing? Do you mean GP or CB?
>
> Either way, could you clarify how does softirqs compete? Until the thread is
> parked, you wouldn't de-offload. And once you de-offload, only then the
> softirq would be executing callbacks. So at any point of time, it is
> either the CB kthread executing CBs or the softirq executing CBs, not both.
> Or did I miss something?
>
> thanks,
>
> - Joel
>
>
> > I'll clarify that in the changelog.
> >
> > Thanks.
Powered by blists - more mailing lists