[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9AOMhy4GCyaqxdG@google.com>
Date: Tue, 24 Jan 2023 16:58:26 +0000
From: Joel Fernandes <joel@...lfernandes.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Zhang, Qiang1" <qiang1.zhang@...el.com>,
"Paul E. McKenney" <paulmck@...nel.org>, quic_neeraju@...cinc.com,
rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] rcu: Remove impossible wakeup rcu GP kthread action
from rcu_report_qs_rdp()
Hi Frederic,
On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote:
> On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote:
> > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@...nel.org> wrote:
> > >
> > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
> > >>> What am I missing?
> > >>
> > >> That the acceleration is also done by __note_gp_changes() once the
> > >> grace period ends anyway, so if any acceleration was missed as you
> > >> say, it will be done anyway.
> > >>
> > >> Also it is done by scheduler tick raising softirq:
> > >>
> > >> rcu_pending() does this:
> > >> /* Has RCU gone idle with this CPU needing another grace period? */
> > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> > >> !rcu_rdp_is_offloaded(rdp) &&
> > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >> return 1;
> > >>
> > >> and rcu_core():
> > >> /* No grace period and unregistered callbacks? */
> > >> if (!rcu_gp_in_progress() &&
> > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
> > >> rcu_nocb_lock_irqsave(rdp, flags);
> > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >> rcu_accelerate_cbs_unlocked(rnp, rdp);
> > >> rcu_nocb_unlock_irqrestore(rdp, flags);
> > >> }
> > >>
> > >> So, I am not sure if you need needacc at all. Those CBs that have not
> > >> been assigned grace period numbers will be taken care off :)
> > >
> > > But that's only when there is no grace period pending, so it can't happen while
> > > we report a QS.
> > >
> > > OTOH without the needacc, those callbacks waiting to be accelerated would be
> > > eventually processed but only on the next tick following the end of a grace
> > > period...if none has started since then. So if someone else starts a new GP
> > > before the current CPU, we must wait another GP, etc...
> > >
> > > That's potentially dangerous.
> >
> > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no
> > guarantee that callback will run immediately at end of GP, there may be one or
> > more GPs before callback can run, if I remember correctly. That is by
> > design.. but please correct me if my understanding is different from yours.
>
> It's not bound to just one GP. If you have N CPUs flooding callbacks for a
> long while, your CPU has 1/N chances to be the one starting the next GP on
> each turn. Relying on the acceleration to be performed only when no GP is
> running may theoretically starve your callbacks forever.
I tend to agree with you, however I was mostly referring to the "needac". The
theoretical case is even more theoretical because you have to be half way
through the transition _and_ flooding at the same time such that there is not
a chance for RCU to be idle for a long period of time (which we don't really
see in our systems per-se). But I agree, even if theoretical, maybe better to
handle it.
> > > And unfortunately we can't do the acceleration from __note_gp_changes()
> > > due to lock ordering restrictions: nocb_lock -> rnp_lock
> > >
Yeah I was more going for the fact that if the offload to deoffload
transition completed before note_gp_changes() was called, which obviously we
cannot assume...
But yeah for the 'limbo between offload to not offload state', eventually RCU
goes idle, the scheduling clock interrupt (via rcu_pending()) will raise
softirq or note_gp_changes() will accelerate. Which does not help the
theoretical flooding case above, but still...
> >
> > Ah. This part I am not sure. Appreciate if point me to any old archive
> > links or documentation detailing that, if possible…
>
> It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has
> that locking order for example.
>
> Excerpt:
>
> rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist,
> &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
> raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ }
>
> Thanks.
Ah I know what you mean now, in other words there is a chance of an ABBA
deadlock if we do not acquire the locks in the correct order. Thank you for
clarifying. Indeed rcutree_migrate_callbacks() also confirms it.
I guess to Frederic's point, another case other than the CB flooding, that can
starve CB accelerations is if the system is in the 'limbo between offload and
de-offload' state for a long period of time. In such a situation also, neither
note_gp_changes(), nor the scheduling-clock interrupt via softirq will help
accelerate CBs.
Whether such a limbo state is possible indefinitely, I am not sure...
Thanks a lot for the discussions and it is always a learning experience
talking about these...!
thanks,
- Joel
Powered by blists - more mailing lists