netdev - CPU stalls with CONFIG_PREEMPT_RT enabled on next-20251006 (Renesas RZ/G2L & RZ/G3E)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+V-a8tWytDVmsk-PK23e4gChXH0pMDR9cKc_xEO4WXpNtr3eA@mail.gmail.com>
Date: Tue, 7 Oct 2025 17:40:09 +0100
From: "Lad, Prabhakar" <prabhakar.csengg@...il.com>
To: netdev <netdev@...r.kernel.org>
Cc: Linux-Renesas <linux-renesas-soc@...r.kernel.org>, 
	Fabrizio Castro <fabrizio.castro.jz@...esas.com>
Subject: CPU stalls with CONFIG_PREEMPT_RT enabled on next-20251006 (Renesas
 RZ/G2L & RZ/G3E)

Hi All,

With CONFIG_PREEMPT_RT enabled, I’m observing CPU stalls from the Rx
path on two different drivers across Renesas platforms.

The first case is on the RZ/G3E SoC using the STMMAC driver:
-----x-----x------x------x------x------x------x------x------x------x------x------x------x
[  173.505971] rcu: INFO: rcu_preempt self-detected stall on CPU
[  173.506014] rcu: 0-....: (2 GPs behind)
idle=de74/1/0x4000000000000000 softirq=0/0 fqs=2178 rcuc=5257
jiffies(starved)
[  173.506077] rcu: (t=5250 jiffies g=2757 q=79 ncpus=4)
[  173.506118] CPU: 0 UID: 0 PID: 290 Comm: irq/107-eth0 Not tainted
6.17.0-next-20251006-00001-gaef898d60052 #19 PREEMPT_RT
[  173.506163] Hardware name: Renesas SMARC EVK version 2 based on
r9a09g047e57 (DT)
[  173.506182] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  173.506217] pc : rt_spin_lock+0x40/0x190
[  173.506280] lr : stmmac_tx_clean.constprop.0+0x80/0x7a8
[  173.506323] sp : ffff800082883a40
[  173.506338] x29: ffff800082883a60 x28: ffff0000c1eb0a80 x27: ffff800082883c18
[  173.506397] x26: ffff80007a330000 x25: ffff800082883c00 x24: ffff80008173a000
[  173.506447] x23: 00000000ffff8479 x22: 0000000000000003 x21: 0000000000000003
[  173.506497] x20: ffff0000c1eb8580 x19: ffff0000c1eb8580 x18: 0000000000000001
[  173.506545] x17: ffff0000c004f800 x16: 0000000000000bfe x15: 0000000000000000
[  173.506593] x14: 0000767bd1d30308 x13: 00000000000003f8 x12: 0000000000000000
[  173.506641] x11: 0000000000000000 x10: 0000000000000000 x9 : 00000000000013c0
[  173.506687] x8 : ffff800082883c88 x7 : 0000000000000000 x6 : 0000000000000000
[  173.506734] x5 : 0000000000000480 x4 : ffff0000c1eb0000 x3 : ffff800082883b67
[  173.506781] x2 : ffff0000c1eb8598 x1 : ffff0000c1f21140 x0 : 0000000000000000
[  173.506829] Call trace:
[  173.506843]  rt_spin_lock+0x40/0x190 (P)
[  173.506905]  stmmac_tx_clean.constprop.0+0x80/0x7a8
[  173.506948]  stmmac_napi_poll_tx+0x6c/0x154
[  173.506989]  __napi_poll.constprop.0+0x38/0x188
[  173.507041]  net_rx_action+0x118/0x264
[  173.507088]  handle_softirqs.isra.0+0xe4/0x1ec
[  173.507149]  __local_bh_enable_ip+0xc4/0x128
[  173.507186]  irq_forced_thread_fn+0x48/0x60
[  173.507240]  irq_thread+0x188/0x31c
[  173.507292]  kthread+0x12c/0x210
[  173.507337]  ret_from_fork+0x10/0x20

The second case is on the RZ/G2L SoC using the RAVB driver:
-----x-----x------x------x------x------x------x------x------x------x------x------x------x
[   70.821322] rcu: INFO: rcu_preempt self-detected stall on CPU
[   70.821351] rcu: 0-....: (4970 ticks this GP)
idle=e2c4/1/0x4000000000000000 softirq=0/0 fqs=2622 rcuc=5112
jiffies(starved)
[   70.821366] rcu: (t=5250 jiffies g=6729 q=98 ncpus=2)
[   70.821382] CPU: 0 UID: 0 PID: 101 Comm: irq/45-11c20000 Not
tainted 6.17.0-next-20251006-00001-gaef898d60052 #19 PREEMPT_RT
[   70.821392] Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT)
[   70.821397] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   70.821404] pc : rt_spin_trylock+0x44/0xd8
[   70.821426] lr : try_charge_memcg+0xd0/0x7a0
[   70.821442] sp : ffff8000820cb3e0
[   70.821445] x29: ffff8000820cb430 x28: 0000000000000001 x27: 0000000000000800
[   70.821459] x26: 0000000000000000 x25: ffff00000bda1140 x24: ffff80008185f3e0
[   70.821469] x23: 0000000000000040 x22: ffff00000cc7e800 x21: ffff80008167e600
[   70.821479] x20: 0000000000000820 x19: 0000000000000002 x18: 0000000000000000
[   70.821489] x17: ffff000009c3e580 x16: 00000000000003f6 x15: 000000000000000c
[   70.821498] x14: 0000000000000000 x13: 0000000000000014 x12: ffff00000bdba800
[   70.821508] x11: ffff00000bda8000 x10: 0000000000005114 x9 : 0000000000000000
[   70.821517] x8 : ffff8000820cb588 x7 : 0000000000000000 x6 : 0000000000000000
[   70.821526] x5 : 0000000000000000 x4 : ffff00000bda1140 x3 : ffff00007ddda618
[   70.821536] x2 : ffff00000bda1140 x1 : ffff00000bda1140 x0 : 0000000000000001
[   70.821546] Call trace:
[   70.821550]  rt_spin_trylock+0x44/0xd8 (P)
[   70.821564]  mem_cgroup_sk_charge+0x2c/0x80
[   70.821572]  __sk_mem_raise_allocated+0x1cc/0x380
[   70.821584]  __sk_mem_schedule+0x3c/0x60
[   70.821592]  tcp_try_rmem_schedule+0x88/0x48c
[   70.821603]  tcp_data_queue+0x2b0/0xe1c
[   70.821611]  tcp_rcv_established+0x3bc/0xba0
[   70.821619]  tcp_v4_do_rcv+0x1ec/0x2b8
[   70.821630]  tcp_v4_rcv+0x954/0xf20
[   70.821640]  ip_protocol_deliver_rcu+0x38/0x1a0
[   70.821648]  ip_local_deliver_finish+0x90/0x120
[   70.821654]  ip_local_deliver+0x7c/0x124
[   70.821661]  ip_rcv+0x74/0x128
[   70.821667]  __netif_receive_skb_core.constprop.0+0x928/0x11b0
[   70.821679]  __netif_receive_skb_list_core+0xe8/0x210
[   70.821688]  netif_receive_skb_list_internal+0x1dc/0x2d0
[   70.821697]  napi_complete_done+0x80/0x1bc
[   70.821705]  ravb_poll+0x170/0x1e4
[   70.821715]  __napi_poll.constprop.0+0x38/0x188
[   70.821723]  net_rx_action+0x118/0x264
[   70.821732]  handle_softirqs.isra.0+0xe4/0x1ec
[   70.821746]  __local_bh_enable_ip+0xc4/0x128
[   70.821753]  irq_forced_thread_fn+0x48/0x60
[   70.821765]  irq_thread+0x188/0x31c
[   70.821775]  kthread+0x12c/0x210
[   70.821785]  ret_from_fork+0x10/0x20


Has anyone else observed similar stalls with PREEMPT_RT enabled on
recent next kernels? I’m trying to determine whether this is primarily
a core/RT scheduling issue or something driver-specific.

Any insights or suggestions would be appreciated.

Cheers,
Prabhakar