linux-kernel - Re: [PATCH] rcu/tree: Improve comments in rcu_report_qs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B0B5741B-2A89-4092-935B-39B23EFEDC27@joelfernandes.org>
Date:   Mon, 13 Feb 2023 09:43:22 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Zhang, Qiang1" <qiang1.zhang@...el.com>
Cc:     Frederic Weisbecker <frederic@...nel.org>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>,
        Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH] rcu/tree: Improve comments in rcu_report_qs_rdp()



> On Feb 11, 2023, at 6:18 AM, Zhang, Qiang1 <qiang1.zhang@...el.com> wrote:
> 
> 
>> 
>> 
>>> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
>>> Recent discussion triggered due to a patch linked below, from Qiang,
>>> shed light on the need to accelerate from QS reporting paths.
>>> 
>>> Update the comments to capture this piece of knowledge.
>>> 
>>> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
>>> Cc: Qiang Zhang <Qiang1.zhang@...el.com>
>>> Cc: Frederic Weisbecker <frederic@...nel.org>
>>> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>>> 
>>> ---
>>> kernel/rcu/tree.c | 13 ++++++++++++-
>>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index 93eb03f8ed99..713eb6ca6902 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>      } else {
>>>              /*
>>>               * This GP can't end until cpu checks in, so all of our
>>> -              * callbacks can be processed during the next GP.
>>> +              * callbacks can be processed during the next GP. Do
>>> +              * the acceleration from here otherwise there may be extra
>>> +              * grace period delays, as any accelerations from rcu_core()
>>> +              * or note_gp_changes() may happen only after the GP after the
>>> +              * current one has already started. Further, rcu_core()
>>> +              * only accelerates if RCU is idle (no GP in progress).
>> 
>> Actually note_gp_changes() should take care of that.
>> 
>> You are referring to  rcu_core() -> rcu_check_quiescent_state() ->
>> note_gp_changes() doing the acceleration prior to the  rcu_core() ->
>> rcu_report_qs_rdp() call, correct?
>> 
>> Ah, but note_gp_changes() has an early return which triggers if either:
>> 1. The rnp spinlock trylock failed.
>> 2. The start of a new grace period was already detected before, so
>> rdp->gp_seq == rnp->gp_seq.
>> 
>> So I think it is possible that we are in the middle of a GP, and
>> rcu_core() is called because QS reporting is required for the CPU, and
>> say the current GP started we are in the middle off occurs from the
>> same CPU so rdp->gp_seq == rnp->gp_seq.
>> 
>> Now, rcu_core()'s call to note_gp_changes() should return early but
>> its later call to report_qs_rdp() will not accelerate the callback
>> without the code we are commenting here.
>> 
>> My gut feeling is that the
>> acceleration in rcu_report_qs_rdp() only stands for:
>> 
>> * callbacks that may be enqueued from an IRQ firing during the small window
>>  between the RNP unlock in note_gp_changes() and the RNP lock in
>>  rcu_report_qs_rdp()
> 
> For rdp which is in the middle of a de-offloading process, the bypass list have been
> flushed, the nocb kthreads may miss callbacks acceleration.   invoke call_rcu()
> will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp()
> to report qs,  even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback
> may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded()
> still return true.
> 
> I think this is also a reason.

I tend to agree with you. I am wondering the best way to document all these reasons. Perhaps it suffices to mention a few reasons briefly here, without going into too much detail (because details may be subject to change).

I will look through this entire thread again and take a call on how to proceed, but do let me know what you and Frederic think about the next steps. The main benefit of commenting is we dont look at this in a few years and run into the same question…

Thanks!

Joel

> 
> Thanks
> Zqiang
> 
>> 
>> Sure, this also seems like a valid reason.
>> 
>> * __note_gp_changes() got called even before from the GP kthread, and callbacks
>>  got enqueued between that and rcu_core().
>> 
>> Agreed. In this case we will take the early return in
>> note_gp_changes() when called from the rcu_core(). So yeah, that was
>> kind of my point as well but slightly different reasoning.
>> 
>> Let me know if you disagree with anything I mentioned, though.
>> 
>> - Joel