linux-kernel - Re: [1/3] rcu: Replace magic number with meaningful constant in rcu_seq_done_exact()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z-1IJ34Xb6RzucCQ@localhost.localdomain>
Date: Wed, 2 Apr 2025 16:22:31 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Joel Fernandes <joel@...lfernandes.org>,
	Josh Triplett <josh@...htriplett.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>,
	"rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [1/3] rcu: Replace magic number with meaningful constant in
 rcu_seq_done_exact() - Patchwork

Le Tue, Apr 01, 2025 at 12:30:40PM -0400, Joel Fernandes a écrit :
> Hello, Frederic,
> 
> On Tue, 1 Apr 2025 16:27:36 GMT, Frederic Weisbecker wrote:
> > Le Mon, Mar 31, 2025 at 02:29:52PM -0700, Paul E. McKenney a écrit :
> > > The disagreement is a feature, at least up to a point.  That feature
> > > allows CPUs to go idle for long periods without RCU having to bother
> > > them or to mess with their per-CPU data (give or take ->gpwrap).  It also
> > > allows per-rcu_node-leaf locking, which is important on large systems.
> > > 
> > > Trying to make precisely globally agreed-on beginnings and ends of
> > > RCU grace periods will not end well from performance, scalability,
> > > or real-time-response viewpoints.  ;-)
> > 
> > The distributed disagreement is definetly a feature. The duplicate root
> > is more debatable.
> > 
> > > But simplifications that don't hurt performance, scalability, and
> > > real-time-response are of course welcome.
> > 
> > I'm not even sure my proposal is a simplification. Perhaps it is. Another
> > hope is that it could avoid future accidents.
> > 
> 
> Aside from the performance concerns:
> 
> Sorry if this is silly but could you provide a small hint as to how
> unifying the global counter with the node affects QS reporting or hotplug?
> It was not immediately obvious to me. Thanks for the help.

First of all rcu_seq_start() must be before the hotplug scan, otherwise
you run into this:

rcu_state.gp_seq = 4


CPU0/ rcu_gp_kthread()                                CPU 1                                                 CPU 2
-------------                                      ----------                                            -----------
//rcu_gp_init()
rcu_for_each_leaf_node(rnp) {
    raw_spin_lock_rcu_node(rnp);
    rnp->qsmaskinit = rnp->qsmaskinitnext
    raw_spin_unlock_rcu_node(rnp);
}
                                                   rcutree_report_cpu_starting()
                                                       raw_spin_lock_rcu_node(rnp);
                                                       rnp->qsmaskinitnext |= rdp->grpmask
                                                       raw_spin_unlock_rcu_node(rnp);
                                                   
                                                   rcu_read_lock()
                                                   r0 = *X
                                                                                                         r1 = *X
                                                                                                         X = NULL
                                                                                                         cookie = get_state_sychronize_rcu()
                                                                                                         //cookie = 8
rcu_seq_start(&rcu_state.gp_seq)
//rcu_state.gp_seq == 5

rcu_for_each_node_breadth_first(rnp)  {
    raw_spin_lock_rcu_node(rnp);
    // Ignore CPU 1
    rnp->qsmask = rnp->qsmaskinit;
    raw_spin_unlock_rcu_node(rnp);
}
[...]

//rcu_gp_cleanup()
rcu_seq_end(&rcu_state.gp_seq)
// rcu_state.gp_seq == 8
                                                                                                         poll_state_sychronize_rcu(cookie)
                                                                                                         kfree(r1)
                                                   r2 = *r0 // CRASH

                                                   
                                                                                                        

So the same applies if we convert rcu_state to use the root node.
But if we do rcu_seq_start() on the root node, then an update side
can call note_gp_changes() because of the state change (only if the
root node is also the unique leaf). But this then happens before
the loop that initializes all the ->qsmask

It's not a correctness problem because it won't make the rdp to
report a QS too early, since rnp->qsmask isn't intialized anyway,
but note_gp_changes() would needlessly lock the rnp lock to record
the state change in rdp->gp_seq.

This is why we need an intermediate state called RCU_SEQ_STARTED
during which note_gp_changes() can safely ignore the state change.

Then once the root's qsmask is initialized, the state can switch
to RCU_SEQ_WAIT_QS, after which calling note_gp_changes() becomes
useful.

Hope that helped.

Thanks.