[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fe07319e-01ec-4700-bc89-e548f4dc7271@nvidia.com>
Date: Mon, 22 Dec 2025 17:26:29 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Yao Kai <yaokai34@...wei.com>
Cc: rcu@...r.kernel.org, liuyongqiang13@...wei.com, paulmck@...nel.org,
frederic@...nel.org, neeraj.upadhyay@...nel.org, josh@...htriplett.org,
boqun.feng@...il.com, urezki@...il.com, rostedt@...dmis.org,
mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
qiang.zhang@...ux.dev, linux-kernel@...r.kernel.org, yujiacheng3@...wei.com
Subject: Re: [PATCH] rcu: Fix rcu_read_unlock() deadloop due to softirq
On 12/22/2025 3:06 AM, Yao Kai wrote:
> Commit 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in
> __rcu_read_unlock()") removes the recursion-protection code from
> __rcu_read_unlock(). Therefore, we could invoke the deadloop in
> raise_softirq_irqoff() with ftrace enabled as follows:
>
> WARNING: CPU: 0 PID: 0 at kernel/trace/trace.c:3021 __ftrace_trace_stack.constprop.0+0x172/0x180
> Modules linked in: my_irq_work(O)
> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G O 6.18.0-rc7-dirty #23 PREEMPT(full)
> Tainted: [O]=OOT_MODULE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:__ftrace_trace_stack.constprop.0+0x172/0x180
> RSP: 0018:ffffc900000034a8 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: ffffffff826d7b87 RDI: ffffffff826e9329
> RBP: 0000000000090009 R08: 0000000000000005 R09: ffffffff82afbc4c
> R10: 0000000000000008 R11: 0000000000011d7a R12: 0000000000000000
> R13: ffff888003874100 R14: 0000000000000003 R15: ffff8880038c1054
> FS: 0000000000000000(0000) GS:ffff8880fa8ea000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055b31fa7f540 CR3: 00000000078f4005 CR4: 0000000000770ef0
> PKRU: 55555554
> Call Trace:
> <IRQ>
> trace_buffer_unlock_commit_regs+0x6d/0x220
> trace_event_buffer_commit+0x5c/0x260
> trace_event_raw_event_softirq+0x47/0x80
> raise_softirq_irqoff+0x6e/0xa0
> rcu_read_unlock_special+0xb1/0x160
> unwind_next_frame+0x203/0x9b0
> __unwind_start+0x15d/0x1c0
> arch_stack_walk+0x62/0xf0
> stack_trace_save+0x48/0x70
> __ftrace_trace_stack.constprop.0+0x144/0x180
> trace_buffer_unlock_commit_regs+0x6d/0x220
> trace_event_buffer_commit+0x5c/0x260
> trace_event_raw_event_softirq+0x47/0x80
> raise_softirq_irqoff+0x6e/0xa0
> rcu_read_unlock_special+0xb1/0x160
> unwind_next_frame+0x203/0x9b0
> __unwind_start+0x15d/0x1c0
> arch_stack_walk+0x62/0xf0
> stack_trace_save+0x48/0x70
> __ftrace_trace_stack.constprop.0+0x144/0x180
> trace_buffer_unlock_commit_regs+0x6d/0x220
> trace_event_buffer_commit+0x5c/0x260
> trace_event_raw_event_softirq+0x47/0x80
> raise_softirq_irqoff+0x6e/0xa0
> rcu_read_unlock_special+0xb1/0x160
> unwind_next_frame+0x203/0x9b0
> __unwind_start+0x15d/0x1c0
> arch_stack_walk+0x62/0xf0
> stack_trace_save+0x48/0x70
> __ftrace_trace_stack.constprop.0+0x144/0x180
> trace_buffer_unlock_commit_regs+0x6d/0x220
> trace_event_buffer_commit+0x5c/0x260
> trace_event_raw_event_softirq+0x47/0x80
> raise_softirq_irqoff+0x6e/0xa0
> rcu_read_unlock_special+0xb1/0x160
> __is_insn_slot_addr+0x54/0x70
> kernel_text_address+0x48/0xc0
> __kernel_text_address+0xd/0x40
> unwind_get_return_address+0x1e/0x40
> arch_stack_walk+0x9c/0xf0
> stack_trace_save+0x48/0x70
> __ftrace_trace_stack.constprop.0+0x144/0x180
> trace_buffer_unlock_commit_regs+0x6d/0x220
> trace_event_buffer_commit+0x5c/0x260
> trace_event_raw_event_softirq+0x47/0x80
> __raise_softirq_irqoff+0x61/0x80
> __flush_smp_call_function_queue+0x115/0x420
> __sysvec_call_function_single+0x17/0xb0
> sysvec_call_function_single+0x8c/0xc0
> </IRQ>
>
> Commit b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> fixed the infinite loop in rcu_read_unlock_special() for IRQ work by
> setting a flag before calling irq_work_queue_on(). We fix this issue by
> setting the same flag before calling raise_softirq_irqoff() and rename the
> flag to defer_qs_pending for more common.
>
> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
> Reported-by: Tengda Wu <wutengda2@...wei.com>
> Signed-off-by: Yao Kai <yaokai34@...wei.com>
> Reviewed-by: Joel Fernandes <joelagnelf@...dia.com>
Thank you. I will apply this patch, I am preparing a few other RCU patches from
me and others. Will send it in a series for review/testing for 7.0.
- Joel
Powered by blists - more mailing lists