linux-kernel - Re: [PATCH] rcu: Fix delayed execution of hurry callbacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7fce1ac56b908d49bf75359dd679d6664b648bf.camel@mediatek.com>
Date: Thu, 24 Jul 2025 02:49:59 +0000
From: Tze-nan Wu (吳澤南) <Tze-nan.Wu@...iatek.com>
To: "frederic@...nel.org" <frederic@...nel.org>, "joelagnelf@...dia.com"
	<joelagnelf@...dia.com>
CC: "jiangshanlai@...il.com" <jiangshanlai@...il.com>,
	Cheng-Jui Wang (王正睿)
	<Cheng-Jui.Wang@...iatek.com>, "josh@...htriplett.org"
	<josh@...htriplett.org>, "boqun.feng@...il.com" <boqun.feng@...il.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
	"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
	"linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	Lorry Luo (罗磊) <Lorry.Luo@...iatek.com>,
	"weiyangyang@...o.com" <weiyangyang@...o.com>,
	Bobule Chang (張弘義) <bobule.chang@...iatek.com>,
	"urezki@...il.com" <urezki@...il.com>, "qiang.zhang@...ux.dev"
	<qiang.zhang@...ux.dev>, "neeraj.upadhyay@...nel.org"
	<neeraj.upadhyay@...nel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, wsd_upstream
	<wsd_upstream@...iatek.com>, "paulmck@...nel.org" <paulmck@...nel.org>,
	"matthias.bgg@...il.com" <matthias.bgg@...il.com>, "rcu@...r.kernel.org"
	<rcu@...r.kernel.org>, "joel@...lfernandes.org" <joel@...lfernandes.org>
Subject: Re: [PATCH] rcu: Fix delayed execution of hurry callbacks

On Wed, 2025-07-23 at 10:32 -0400, Joel Fernandes wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> On 7/18/2025 5:10 AM, Tze-nan Wu (吳澤南) wrote:
> > On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote:
> > > 
> > > External email : Please do not click links or open attachments
> > > until
> > > you have verified the sender or the content.
> > > 
> > > 
> > > Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit :
> > > > We observed a regression in our customer’s environment after
> > > > enabling
> > > > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where
> > > > ioctl() is
> > > > used heavily, we found that callbacks queued via call_rcu_hurry
> > > > (such as
> > > > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up
> > > > to
> > > > 5
> > > > seconds before execution. This occurs because the new grace
> > > > period
> > > > does
> > > > not start immediately after the previous one completes.
> > > > 
> > > > The root cause is that the wake_nocb_gp_defer() function now
> > > > checks
> > > > "rdp->nocb_defer_wakeup" instead of "rdp_gp-
> > > > >nocb_defer_wakeup". On
> > > > CPUs
> > > > that are not rcuog, "rdp->nocb_defer_wakeup" may always be
> > > > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup"
> > > > to be
> > > > downgraded and the "rdp_gp->nocb_timer" to be postponed by up
> > > > to 10
> > > > seconds, delaying the execution of hurry RCU callbacks.
> > > > 
> > > > The trace log of one scenario we encountered is as follow:
> > > >   // previous GP ends at this point
> > > >   rcu_preempt   [000] d..1.   137.240210: rcu_grace_period:
> > > > rcu_preempt 8369 end
> > > >   rcu_preempt   [000] .....   137.240212: rcu_grace_period:
> > > > rcu_preempt 8372 reqwait
> > > >   // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu",
> > > > the
> > > > callback waited on by UpdateEngine
> > > >   update_engine [002] d..1.   137.301593: __call_rcu_common:
> > > > wyy:
> > > > unlikely p_ref = 00000000********. lazy = 0
> > > >   // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1
> > > > jiffy (4ms)
> > > >   // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
> > > >   update_engine [002] d..2.   137.301595: rcu_nocb_wake:
> > > > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
> > > >   // FirstBQ event on cpu2 during the 1 jiffy, make the timer
> > > > postpond 10 seconds later.
> > > >   // also, the rdp_gp->nocb_defer_wakeup is overwrite to
> > > > RCU_NOCB_WAKE_LAZY
> > > >   update_engine [002] d..1.   137.301601: rcu_nocb_wake:
> > > > rcu_preempt 2 WakeEmptyIsDeferred
> > > >   ...
> > > >   ...
> > > >   ...
> > > >   // before the 10 seconds timeout, cpu0 received another
> > > > call_rcu_hurry
> > > >   // reset the timer to jiffies+1 and set the waketype =
> > > > RCU_NOCB_WAKE.
> > > >   kworker/u32:0 [000] d..2.   142.557564: rcu_nocb_wake:
> > > > rcu_preempt 0 FirstQ
> > > >   kworker/u32:0 [000] d..1.   142.557576: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeEmptyIsDeferred
> > > >   kworker/u32:0 [000] d..1.   142.558296: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeNot
> > > >   kworker/u32:0 [000] d..1.   142.558562: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeNot
> > > >   // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype
> > > > ==
> > > > RCU_NOCB_WAKE
> > > >   <idle>        [000] d..1.   142.558786: rcu_nocb_wake:
> > > > rcu_preempt 0 DoWake
> > > >   <idle>        [000] dN.1.   142.558839: rcu_nocb_wake:
> > > > rcu_preempt 0 DeferredWake
> > > >   rcuog/0       [000] .....   142.558871: rcu_nocb_wake:
> > > > rcu_preempt 0 EndSleep
> > > >   rcuog/0       [000] .....   142.558877: rcu_nocb_wake:
> > > > rcu_preempt 0 Check
> > > >   // finally rcuog request a new GP at this point (5 seconds
> > > > after
> > > > the FirstQ event)
> > > >   rcuog/0       [000] d..2.   142.558886: rcu_grace_period:
> > > > rcu_preempt 8372 newreq
> > > >   rcu_preempt   [001] d..1.   142.559458: rcu_grace_period:
> > > > rcu_preempt 8373 start
> > > >   ...
> > > >   rcu_preempt   [000] d..1.   142.564258: rcu_grace_period:
> > > > rcu_preempt 8373 end
> > > >   rcuop/2       [000] D..1.   142.566337: rcu_batch_start:
> > > > rcu_preempt CBs=219 bl=10
> > > >   // the hurry CB is invoked at this point
> > > >   rcuop/2       [000] b....   142.566352:
> > > > blk_queue_usage_counter_release: wyy: wakeup. p_ref =
> > > > 00000000********.
> > > > 
> > > > This patch changes the condition to check "rdp_gp-
> > > > > nocb_defer_wakeup" in
> > > > the lazy path. This prevents an already scheduled "rdp_gp-
> > > > > nocb_timer"
> > > > from being postponed and avoids overwriting "rdp_gp-
> > > > > nocb_defer_wakeup"
> > > > when it is not RCU_NOCB_WAKE_NOT.
> > > > 
> > > > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
> > > > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@...iatek.com>
> > > > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@...iatek.com>
> > > > Co-developed-by: Lorry.Luo@...iatek.com
> > > > Signed-off-by: Lorry.Luo@...iatek.com
> > > > Tested-by: weiyangyang@...o.com
> > > > Signed-off-by: weiyangyang@...o.com
> > > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@...iatek.com>
> > > > ---
> > > > The regression is first observed by wyy in the Update Engine
> > > > scenario
> > > > with  CONFIG_LAZY_RCU enabled. there is an additional delay of
> > > > 4–5
> > > > seconds during the heavy ioctl API call, waiting for
> > > > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
> > > > 
> > > > Initially, we suspected that the
> > > > percpu_ref_switch_to_atomic_rcu
> > > > function itself was taking too long. However, after enabling
> > > > some
> > > > custome and the following trace events: rcu_do_batch,
> > > > rcu_nocb_wake, and
> > > > rcu_grace_period. we found that the root cause was that rcuog
> > > > was
> > > > not
> > > > being woken up in time to request a new GP. This led to the
> > > > delay
> > > > in
> > > > invoking the hurry RCU callback
> > > > (percpu_ref_switch_to_atomic_rcu).
> > > > 
> > > > Environment:
> > > >   Android-16, Kernel: 6.12, 8 CPUs (ARM)
> > > > 
> > > > Configuration:
> > > >   CONFIG_TREE_RCU=y
> > > >   CONFIG_PREEMPT_RCU=y
> > > >   CONFIG_LAZY_RCU=y
> > > >   CONFIG_RCU_NOCB_CPU=y
> > > >   CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
> > > >   rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
> > > >   jiffies_lazy_flush = 10 * HZ
> > > > 
> > > > Contributions:
> > > > Tze-Nan Wu:
> > > > Collaborated with Cheng-Jui to discuss which tracepoints needed
> > > > to
> > > > be
> > > > added, jointly analyzed the trace logs, identified the root
> > > > cause,
> > > > and
> > > > proposed this upstream change.
> > > > 
> > > > Cheng-Jui Wang:
> > > > Provided many valuable suggestions during the debugging
> > > > process,
> > > > repeatedly found breakthroughs when we were stuck, and helped
> > > > identify
> > > > the root cause.
> > > > 
> > > > Lorry Luo:
> > > > Assisted in verifying whether rcu-hurry-callback was executed
> > > > too
> > > > long
> > > > or deferred, supported with testing, and helped with
> > > > communication.
> > > > 
> > > > Weiyangyang:
> > > > Main tester who discovered the regression scenario, confirmed
> > > > that
> > > > enabling CONFIG_LAZY_RCU caused the regression, and verified
> > > > that
> > > > this
> > > > patch resolves the issue
> > > 
> > > Nice team work :-)
> > > 
> > Thanks :)
> > > > 
> > > > Note:
> > > > With my limited understanding of lazy RCU, I am not fully
> > > > confident
> > > > that
> > > > this is a real issue. In my opinion, hurry callbacks should not
> > > > be
> > > > delayed by other events such as firstBQ trace event.
> > > > If my understanding is incorrect, I would greatly appreciate
> > > > any
> > > > guidance or clarification from the maintainers.
> > > 
> > > Your understanding looks quite right!
> > > 
> > Glad to know I was on the right track.
> > 
> > > Reviewed-by: Frederic Weisbecker <frederic@...nel.org>
> > > 
> > > Since the issue is there for 3 years now and was introduced with
> > > the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks
> > > for the next merge window.
> 
> 
> Nice, thanks!
> 
> Reviewed-by: Joel Fernandes <joelagnelf@...dia.com>
> 
> I suggest let us add the Fixes tag too so that stable gets it.
> 
> thanks,
> 
>  - Joel
> 

I noticed that I’ve already added the FIXES: tag, did you mean that we
should also add this line "CC:stable@...r.kernel.org" ?

If that’s the case, should I submit a new version of this patch(V2)?

thanks,
--
Tze-Nan Wu