linux-kernel - Re: [1/3] rcu: Replace magic number with meaningful constant in rcu_seq_done_exact()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0c721fa7-f458-42ae-b206-9168539b6356@nvidia.com>
Date: Sat, 5 Apr 2025 07:23:44 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
 Joel Fernandes <joel@...lfernandes.org>,
 Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
 Uladzislau Rezki <urezki@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [1/3] rcu: Replace magic number with meaningful constant in
 rcu_seq_done_exact() - Patchwork

Hi Frederic,

On 4/2/2025 10:22 AM, Frederic Weisbecker wrote:

>> On Tue, 1 Apr 2025 16:27:36 GMT, Frederic Weisbecker wrote:
>>> Le Mon, Mar 31, 2025 at 02:29:52PM -0700, Paul E. McKenney a écrit :
>>>> The disagreement is a feature, at least up to a point.  That feature
>>>> allows CPUs to go idle for long periods without RCU having to bother
>>>> them or to mess with their per-CPU data (give or take ->gpwrap).  It also
>>>> allows per-rcu_node-leaf locking, which is important on large systems.
>>>>
>>>> Trying to make precisely globally agreed-on beginnings and ends of
>>>> RCU grace periods will not end well from performance, scalability,
>>>> or real-time-response viewpoints.  ;-)
>>>
>>> The distributed disagreement is definetly a feature. The duplicate root
>>> is more debatable.
>>>
>>>> But simplifications that don't hurt performance, scalability, and
>>>> real-time-response are of course welcome.
>>>
>>> I'm not even sure my proposal is a simplification. Perhaps it is. Another
>>> hope is that it could avoid future accidents.
>>>
>>
>> Aside from the performance concerns:
>>
>> Sorry if this is silly but could you provide a small hint as to how
>> unifying the global counter with the node affects QS reporting or hotplug?
>> It was not immediately obvious to me. Thanks for the help.
> 
> First of all rcu_seq_start() must be before the hotplug scan, otherwise
> you run into this:
> 
> rcu_state.gp_seq = 4
> 
> 
> CPU0/ rcu_gp_kthread()                                CPU 1                                                 CPU 2
> -------------                                      ----------                                            -----------
> //rcu_gp_init()
> rcu_for_each_leaf_node(rnp) {
>     raw_spin_lock_rcu_node(rnp);
>     rnp->qsmaskinit = rnp->qsmaskinitnext
>     raw_spin_unlock_rcu_node(rnp);
> }
>                                                    rcutree_report_cpu_starting()
>                                                        raw_spin_lock_rcu_node(rnp);
>                                                        rnp->qsmaskinitnext |= rdp->grpmask
>                                                        raw_spin_unlock_rcu_node(rnp);
>                                                    
>                                                    rcu_read_lock()
>                                                    r0 = *X
>                                                                                                          r1 = *X
>                                                                                                          X = NULL
>                                                                                                          cookie = get_state_sychronize_rcu()
>                                                                                                          //cookie = 8
> rcu_seq_start(&rcu_state.gp_seq)
> //rcu_state.gp_seq == 5
> 
> rcu_for_each_node_breadth_first(rnp)  {
>     raw_spin_lock_rcu_node(rnp);
>     // Ignore CPU 1
>     rnp->qsmask = rnp->qsmaskinit;
>     raw_spin_unlock_rcu_node(rnp);
> }
> [...]
> 
> //rcu_gp_cleanup()
> rcu_seq_end(&rcu_state.gp_seq)
> // rcu_state.gp_seq == 8
>                                                                                                          poll_state_sychronize_rcu(cookie)
>                                                                                                          kfree(r1)
>                                                    r2 = *r0 // CRASH
> 
>                                                    
>                                                                                   

Makes sense! It will be too short GP if we did seq_start but missed the incoming
CPU which happened to start a reader before we did the seq_start.

> So the same applies if we convert rcu_state to use the root node.
> But if we do rcu_seq_start() on the root node, then an update side
> can call note_gp_changes() because of the state change (only if the
> root node is also the unique leaf). But this then happens before
> the loop that initializes all the ->qsmask
> 
> It's not a correctness problem because it won't make the rdp to
> report a QS too early, since rnp->qsmask isn't intialized anyway,
> but note_gp_changes() would needlessly lock the rnp lock to record
> the state change in rdp->gp_seq.
> 
> This is why we need an intermediate state called RCU_SEQ_STARTED
> during which note_gp_changes() can safely ignore the state change.

Ah ok, so it is not a correctness problem.. that's what I was wondering. It is
more of a performance optimization it sounds like.

> 
> Then once the root's qsmask is initialized, the state can switch
> to RCU_SEQ_WAIT_QS, after which calling note_gp_changes() becomes
> useful.

I agree, it is helpful to track the initialization through states and check for
that in code paths.

> Hope that helped.

It does, thanks!

 - Joel