lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68fd874e-ac9b-4d91-ab05-903bf45856fa@paulmck-laptop>
Date: Thu, 22 Jan 2026 21:46:58 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
	rcu@...r.kernel.org, Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Josh Triplett <josh@...htriplett.org>,
	Uladzislau Rezki <urezki@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang@...ux.dev>
Subject: Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload
 handling

On Thu, Jan 22, 2026 at 08:29:41PM -0500, Joel Fernandes wrote:
> On Jan 22, 2026, at 7:18 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
> > On Thu, Jan 22, 2026 at 06:43:31PM -0500, Joel Fernandes wrote:
> > > On Thu, Jan 22, 2026 at 01:55:11PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
> > > > > -    } else if (len > rdp->qlen_last_fqs_check + qhimark) {
> > > > > -        /* ... or if many callbacks queued. */
> > > > > -        rdp->qlen_last_fqs_check = len;
> > > > > -        j = jiffies;
> > > > > -        if (j != rdp->nocb_gp_adv_time &&
> > > > > -            rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > > > This places in cur_gp_seq not the grace period for the current callback
> > > > (which would be unlikely to have finished), but rather the grace period
> > > > for the oldest callback that has not yet been marked as done.  And that
> > > > callback started some time ago, and thus might well have finished.
> > > > So while this code might not have been executed in your tests, it is
> > > > definitely not a logical contradiction.
> > > > Or am I missing something subtle here?
> > > 
> > > You're right that it's not a logical contradiction - I was imprecise.
> > > rcu_segcblist_nextgp() returns the GP for the oldest pending callback,
> > > which could indeed have completed.
> > > 
> > > However, the question becomes: under what scenario do we need to advance
> > > here? If that GP completed, rcuog should have already advanced those
> > > callbacks. The only way this code path can execute is if rcuog is starved
> > > and not running to advance them, right?
> > 
> > That is one way.  The other way is if the RCU grace-period gets delayed
> > (perhaps by vCPU preemption) between the time that it updates the
> > leaf rcu_node structure's ->gp_seq field and the time that it invokes
> > rcu_nocb_gp_cleanup().
> 
> I see the window you're describing. In rcu_gp_cleanup(), for each leaf node:
> 
>     WRITE_ONCE(rnp->gp_seq, new_gp_seq);   // GP appears complete
>     ...
>     raw_spin_unlock_irq_rcu_node(rnp);
> 
>     /* vCPU preemption  */
>     rcu_nocb_gp_cleanup(sq);               // wakes rcuog
> 
> So yes, in this window, the call_rcu() CPU could see the updated gp_seq and
> have rcu_seq_done() return true for the now-completed GP.
> 
> However, even in this window, advancing callbacks doesn't help:
> 
> 1. We advance callbacks from WAIT to DONE state
> 2. But rcuog is still sleeping, waiting for GP kthread to wake it
> 3. rcuoc is still sleeping, waiting for rcuog to wake it
> 4. Callbacks sit in DONE state but nobody invokes them
> 
> So the critical path is unchanged:
>    swake_up_all() -> rcuog -> rcuoc -> invoke.
> 
> I guess this is the redundancy argument - the window exists, but
> exploiting it provides no meaningful benefit AFAICS.

I gave you a Reviewed-by for this one and reaffirm my Reviewed-by for
the other two.  But you break it, you buy it!  ;-)

> > > But as Frederic pointed out, even if rcuog is starved, advancing here
> > > doesn't help - rcuog must still run anyway to wake the callback thread.
> > > We're just duplicating work it will do when it finally gets to run.
> > 
> > So maybe we don't want that first patch after all?  ;-)
> 
> Do you mean we want the first patch so that it can remove the code that we
> don't want?

It would wake up the rcuog kthread if the RCU grace-period kthread was
slow to do so.  But I agree with simplifying it and working out how to
make it more robust as a separate effort.

> > > The extensive testing (300K callback floods, hours of rcutorture) showing
> > > zero hits confirms this window is practically unreachable. I can update the
> > > commit message to remove the "logical contradiction" claim and focus on the
> > > redundancy argument instead.
> > 
> > That would definitely be good!
> 
> Thanks. I will focus on this argument, then. I will resend with a better
> patch description in the morning.

And my Reviewed-by does assume that change, so go ahead and send the
improved commit log with my Reviewed-by appended.

> > > Would that address your concern?
> > 
> > Your point about the rcuoc kthread needing to be awakened is a good one.
> > I am still concerned about flooding on busy systems, especially if the
> > busy component is an underlying hypervisor, but we might need a more
> > principled approach for that situation.
> 
> Hmm true. There is also the case that any of the kthreads in the way of the
> callback getting preempted by the hypervisor could also be problematic, to
> your point of requiring a more principled approach. I guess we did not want
> the reader side vCPU preemption workarounds either for similar reason.

Well, principles only get you so far.  We need both the principles and the
pragmatism to know when to depart from those principles when warranted.

> One trick I found irrespective of virtualization, is, rcu_nocb_poll can
> result in grace periods completing faster. I think this could help overload
> situations by retiring callbacks sooner than later. I can experiment with
> this idea in future. Was considering a dynamic trigger to enable polling
> mode in overload. I guess there is one way to find out how well this will
> work, but initial testing does look promising. :-D.

Careful of the effect on power consumption, especially for the world of
battery-powered embedded systems!  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ