[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877c02vejr.ffs@tglx>
Date: Sun, 20 Jul 2025 21:38:00 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: 白烁冉 <baishuoran@...eu.edu.cn>, Peter Zijlstra
<peterz@...radead.org>
Cc: Kun Hu <huk23@...udan.edu.cn>, Jiaji Qin <jjtan24@...udan.edu.cn>,
linux-kernel@...r.kernel.org
Subject: Re: possible deadlock in smp_call_function_many_cond
On Fri, Jul 11 2025 at 22:03, 白烁冉 wrote:
> When using our customized Syzkaller to fuzz the latest Linux kernel,
> the following crash (122th)was triggered.
>
> HEAD commit: 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2
> git tree: upstream
That's not the latest kernel.
> Output:https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122report.txt
> Kernel config:https://github.com/pghk13/Kernel-Bug/blob/main/0305_6.14rc3/config.txt
> C reproducer:https:https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122repro.c
> Syzlang reproducer: https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122repro.txt
>
> Our reproducer uses mounts a constructed filesystem image.
>
> The error occurred around line 880 of the code, specifically during
> the call to csd_lock_wait. The status of CPU 1 (RCU GP kthread):
> executing the perf_event_open system call, needs to update tracepoint
I can't find a perf_event_open() syscall in the C reproducer. So how is
that supposed to be reproduced?
> calls on all CPUs, and smp_call_function_many_cond is stuck waiting
> for CPU 2 to respond to the IPI. We have reproduced this issue
> several times on 6.14 again.
Again not the latest kernel. Please run it against Linus latest tree and
if it still triggers, provide proper information how to reproduce. If
not you should be able to bisect the fix.
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 2-...!: (3 GPs behind) idle=b834/1/0x4000000000000000 softirq=23574/23574 fqs=5
> rcu: (detected by 1, t=10502 jiffies, g=19957, q=594 ncpus=4)
So CPU 1 detects an RCU stall on CPU2
> Sending NMI from CPU 1 to CPUs 2:
> NMI backtrace for cpu 2
> CPU: 2 UID: 0 PID: 9461 Comm: sshd Not tainted 6.14.0 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:__lock_acquire+0x106/0x46b0
> Code: ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 ec 35 00 00 49 8b 45 00 48 3d a0 c7 8a 93 0f 84 29 0f 00 00 44 8b 05 2a dc 74 0c <45> 85 c0 0f 84 ad 06 00 00 48 3d e0 c7 8a 93 0f 84 a1 06 00 00 41
> RSP: 0018:ffffc90000568ac8 EFLAGS: 00000002
> RAX: ffffffff9aab9a20 RBX: 0000000000000000 RCX: 1ffff920000ad16c
> RDX: 1ffffffff35692cf RSI: 0000000000000000 RDI: ffffffff9ab49678
> RBP: ffff8880201aa480 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000001 R11: ffffffff90617d17 R12: 0000000000000000
> R13: ffffffff9ab49678 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007fa644657900(0000) GS:ffff88802b900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0fa92178a9 CR3: 0000000000e90000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <NMI>
> </NMI>
> <IRQ>
> lock_acquire+0x1b6/0x570
> _raw_spin_lock_irqsave+0x3d/0x60
> debug_object_deactivate+0x139/0x390
> __hrtimer_run_queues+0x416/0xc30
> hrtimer_interrupt+0x398/0x890
> __sysvec_apic_timer_interrupt+0x114/0x400
> sysvec_apic_timer_interrupt+0xa3/0xc0
which handles the timer interrupt. What you cut off in your report is:
[ 321.491987][ C2] hrtimer: interrupt took 31336677795 ns
That means the hrtimer interrupt got stuck for 32 seconds (!!!). That
warning is only emitted once, so I assume there is something weird going
on with hrtimers and one of their callbacks. But there is no indication
where this comes from.
Can you enable the hrtimer_expire_entry/exit tracepoints on the kernel
command line and add 'ftrace_dump_on_oops' as well, so that the trace
gets dumped with the rcu stall splat?
Thanks,
tglx
Powered by blists - more mailing lists