linux-kernel - Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a0b81523-636b-46e4-88a0-4cc4ddad4ea4@paulmck-laptop>
Date: Thu, 22 Jan 2026 16:12:27 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
	rcu@...r.kernel.org, Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Josh Triplett <josh@...htriplett.org>,
	Uladzislau Rezki <urezki@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang@...ux.dev>
Subject: Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload
 handling

On Thu, Jan 22, 2026 at 06:43:31PM -0500, Joel Fernandes wrote:
> On Thu, Jan 22, 2026 at 01:55:11PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
> > > -	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
> > > -		/* ... or if many callbacks queued. */
> > > -		rdp->qlen_last_fqs_check = len;
> > > -		j = jiffies;
> > > -		if (j != rdp->nocb_gp_adv_time &&
> > > -		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> >
> > This places in cur_gp_seq not the grace period for the current callback
> > (which would be unlikely to have finished), but rather the grace period
> > for the oldest callback that has not yet been marked as done.  And that
> > callback started some time ago, and thus might well have finished.
> >
> > So while this code might not have been executed in your tests, it is
> > definitely not a logical contradiction.
> >
> > Or am I missing something subtle here?
> 
> You're right that it's not a logical contradiction - I was imprecise.
> rcu_segcblist_nextgp() returns the GP for the oldest pending callback,
> which could indeed have completed.
> 
> However, the question becomes: under what scenario do we need to advance
> here? If that GP completed, rcuog should have already advanced those
> callbacks. The only way this code path can execute is if rcuog is starved
> and not running to advance them, right?

That is one way.  The other way is if the RCU grace-period gets delayed
(perhaps by vCPU preemption) between the time that it updates the
leaf rcu_node structure's ->gp_seq field and the time that it invokes
rcu_nocb_gp_cleanup().

> But as Frederic pointed out, even if rcuog is starved, advancing here
> doesn't help - rcuog must still run anyway to wake the callback thread.
> We're just duplicating work it will do when it finally gets to run.

So maybe we don't want that first patch after all?  ;-)

> The extensive testing (300K callback floods, hours of rcutorture) showing
> zero hits confirms this window is practically unreachable. I can update the
> commit message to remove the "logical contradiction" claim and focus on the
> redundancy argument instead.

That would definitely be good!

> Would that address your concern?

Your point about the rcuoc kthread needing to be awakened is a good one.
I am still concerned about flooding on busy systems, especially if the
busy component is an underlying hypervisor, but we might need a more
principled approach for that situation.

							Thanx, Paul