[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <67F2A635-B521-4D30-A60B-8DFEAFACBDF0@nvidia.com>
Date: Tue, 23 Dec 2025 01:33:12 +0000
From: Joel Fernandes <joelagnelf@...dia.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Frederic
Weisbecker <frederic@...nel.org>, Neeraj Upadhyay
<neeraj.upadhyay@...nel.org>, Josh Triplett <josh@...htriplett.org>, Boqun
Feng <boqun.feng@...il.com>, Uladzislau Rezki <urezki@...il.com>, Steven
Rostedt <rostedt@...dmis.org>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang@...ux.dev>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH RFC] rcu: Reduce synchronize_rcu() latency by reporting GP
kthread's CPU QS early
> On Dec 22, 2025, at 8:21 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Mon, Dec 22, 2025 at 07:30:39PM -0500, Joel Fernandes wrote:
>> The RCU grace period mechanism uses a two-phase FQS (Force Quiescent
>> State) design where the first FQS saves dyntick-idle snapshots and
>> the second FQS compares them. This results in long and unncessary latency for
>> synchronize_rcu() on idle systems (two FQS waits of ~3ms each with 1000HZ)
>> whenever one FQS wait sufficed.
>>
>> Some investigations showed that the GP kthread's CPU is the holdout CPU
>> a lot of times after the first FQS as - it cannot be detected as "idle"
>> because it's actively running the FQS scan in the GP kthread.
>>
>> Therefore, at the start of the first FQS, immediately report a quiescent
>> state for the GP kthread's CPU using rcu_qs() + rcu_report_qs_rdp(). The
>> GP kthread cannot be in an RCU read-side critical section while running
>> the FQS scan, so this is safe and results in significant tail latency
>> improvements.
>>
>> I benchmarked 100 synchronize_rcu() calls, 6 runs each showing good tail
>> latency improvements per synchronize_rcu() call (default settings for fqs
>> jiffies):
>>
>> Baseline (without fix):
>> | Run | Mean | Min | Max |
>> |-----|----------|----------|-----------|
>> | 1 | 4.036 ms | 3.509 ms | 7.973 ms |
>> | 2 | 4.049 ms | 3.904 ms | 8.003 ms |
>> | 3 | 4.033 ms | 1.160 ms | 10.083 ms |
>> | 4 | 3.993 ms | 3.145 ms | 4.093 ms |
>> | 5 | 3.988 ms | 2.675 ms | 4.123 ms |
>> | 6 | 4.019 ms | 3.894 ms | 5.845 ms |
>>
>> With fix:
>> | Run | Mean | Min | Max |
>> |-----|----------|----------|----------|
>> | 1 | 3.991 ms | 2.953 ms | 4.125 ms |
>> | 2 | 3.995 ms | 3.439 ms | 4.081 ms |
>> | 3 | 3.989 ms | 2.974 ms | 4.079 ms |
>> | 4 | 3.997 ms | 3.667 ms | 4.072 ms |
>> | 5 | 4.027 ms | 2.550 ms | 7.928 ms |
>> | 6 | 3.989 ms | 2.886 ms | 4.076 ms |
>>
>> The fix reduces worst-case latency due to the second FQS wait not
>> running when not needed.
>>
>> Tested rcutorture TREE and SRCU configurations.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
>
> Nice results!!!
Thanks!
>
> But why not do this at the end of rcu_gp_init()?
Yes that is better, I will give that a try. Thanks,
- Joel
>
> Thanx, Paul
>
>> ---
>> kernel/rcu/tree.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index 8293bae1dec1..c116ed7633d3 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -160,6 +160,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
>> unsigned long gps, unsigned long flags);
>> static void invoke_rcu_core(void);
>> static void rcu_report_exp_rdp(struct rcu_data *rdp);
>> +static void rcu_report_qs_rdp(struct rcu_data *rdp);
>> static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
>> static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>> static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
>> @@ -2032,6 +2033,17 @@ static void rcu_gp_fqs(bool first_time)
>> }
>>
>> if (first_time) {
>> + /*
>> + * Immediately report QS for the GP kthread's CPU. The GP kthread
>> + * cannot be in an RCU read-side critical section while running
>> + * the FQS scan. This eliminates the need for a second FQS wait
>> + * when all CPUs are idle.
>> + */
>> + preempt_disable();
>> + rcu_qs();
>> + rcu_report_qs_rdp(this_cpu_ptr(&rcu_data));
>> + preempt_enable();
>> +
>> /* Collect dyntick-idle snapshots. */
>> force_qs_rnp(rcu_watching_snap_save);
>> } else {
>> --
>> 2.34.1
>>
Powered by blists - more mailing lists