linux-kernel - Re: possible deadlock in smp_call_function_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C0FEC6BF-BCC8-4301-BBE6-8A49A05D50D6@m.fudan.edu.cn>
Date: Thu, 24 Jul 2025 15:55:47 +0800
From: 胡焜 <huk23@...udan.edu.cn>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: 白烁冉 <baishuoran@...eu.edu.cn>,
 Peter Zijlstra <peterz@...radead.org>,
 "jjtan24@...udan.edu.cn" <jjtan24@...udan.edu.cn>,
 linux-kernel@...r.kernel.org
Subject: Re: possible deadlock in smp_call_function_many_cond

Hi Thomas:

We've reproduced the issue on multiple kernels (6.16-rc4, etc.), but just noticed that the latest tree has been updated to rc7, so we're still verifying.

Per your suggestion, we turned on multiple options to print these details, including: ftrace_dump_on_oops hrtimer_expire_entry hrtimer_expire_exit trace_event=ipi_entry,ipi_exit,irq_handler_ entry,irq_handler_exit. also turned on CONFIG_CSD LOCK WAIT DEBUG.

My guess is that the crash point might be csd_lock_wait(csd) in ~/kernel/smp.c on line 885. Here the other CPUs are notified via IPI to execute flush_tlb_mm_range and enter spin wait. However, it has not yet been determined which CPU performed the operation that caused the IPI to return a timeout.

I'll first provide the log that was reproduced in 6.16-rc4. we'll re-verify it on rc7.

thanks,
Kun 

> 2025年7月21日 03:38，Thomas Gleixner <tglx@...utronix.de> 写道：
> 
> On Fri, Jul 11 2025 at 22:03, 白烁冉 wrote:
>> When using our customized Syzkaller to fuzz the latest Linux kernel,
>> the following crash (122th)was triggered.
>> 
>> HEAD commit: 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2
>> git tree: upstream
> 
> That's not the latest kernel.
> 
>> Output:https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122report.txt
>> Kernel config:https://github.com/pghk13/Kernel-Bug/blob/main/0305_6.14rc3/config.txt
>> C reproducer:https:https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122repro.c
>> Syzlang reproducer: https://github.com/pghk13/Kernel-Bug/blob/main/0702_6.14/INFO%3A%20rcu%20detected%20stall%20in%20sys_select/122repro.txt
>> 

View attachment "122log.txt" of type "text/plain" (918769 bytes)

> 
>> Our reproducer uses mounts a constructed filesystem image.
>> 
>> The error occurred around line 880 of the code, specifically during
>> the call to csd_lock_wait. The status of CPU 1 (RCU GP kthread):
>> executing the perf_event_open system call, needs to update tracepoint
> 
> I can't find a perf_event_open() syscall in the C reproducer. So how is
> that supposed to be reproduced?
> 
>> calls on all CPUs, and smp_call_function_many_cond is stuck waiting
>> for CPU 2 to respond to the IPI.  We have reproduced this issue
>> several times on 6.14 again.
> 
> Again not the latest kernel. Please run it against Linus latest tree and
> if it still triggers, provide proper information how to reproduce. If
> not you should be able to bisect the fix.
> 
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu: 2-...!: (3 GPs behind) idle=b834/1/0x4000000000000000 softirq=23574/23574 fqs=5
>> rcu: (detected by 1, t=10502 jiffies, g=19957, q=594 ncpus=4)
> 
> So CPU 1 detects an RCU stall on CPU2
> 
>> Sending NMI from CPU 1 to CPUs 2:
>> NMI backtrace for cpu 2
>> CPU: 2 UID: 0 PID: 9461 Comm: sshd Not tainted 6.14.0 #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>> RIP: 0010:__lock_acquire+0x106/0x46b0
>> Code: ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 ec 35 00 00 49 8b 45 00 48 3d a0 c7 8a 93 0f 84 29 0f 00 00 44 8b 05 2a dc 74 0c <45> 85 c0 0f 84 ad 06 00 00 48 3d e0 c7 8a 93 0f 84 a1 06 00 00 41
>> RSP: 0018:ffffc90000568ac8 EFLAGS: 00000002
>> RAX: ffffffff9aab9a20 RBX: 0000000000000000 RCX: 1ffff920000ad16c
>> RDX: 1ffffffff35692cf RSI: 0000000000000000 RDI: ffffffff9ab49678
>> RBP: ffff8880201aa480 R08: 0000000000000001 R09: 0000000000000001
>> R10: 0000000000000001 R11: ffffffff90617d17 R12: 0000000000000000
>> R13: ffffffff9ab49678 R14: 0000000000000000 R15: 0000000000000000
>> FS:  00007fa644657900(0000) GS:ffff88802b900000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f0fa92178a9 CR3: 0000000000e90000 CR4: 0000000000750ef0
>> PKRU: 55555554
>> Call Trace:
>> <NMI>
>> </NMI>
>> <IRQ>
>> lock_acquire+0x1b6/0x570
>> _raw_spin_lock_irqsave+0x3d/0x60
>> debug_object_deactivate+0x139/0x390
>> __hrtimer_run_queues+0x416/0xc30
>> hrtimer_interrupt+0x398/0x890
>> __sysvec_apic_timer_interrupt+0x114/0x400
>> sysvec_apic_timer_interrupt+0xa3/0xc0
> 
> which handles the timer interrupt. What you cut off in your report is:
> 
> [  321.491987][    C2] hrtimer: interrupt took 31336677795 ns
> 
> That means the hrtimer interrupt got stuck for 32 seconds (!!!). That
> warning is only emitted once, so I assume there is something weird going
> on with hrtimers and one of their callbacks. But there is no indication
> where this comes from.
> 
> Can you enable the hrtimer_expire_entry/exit tracepoints on the kernel
> command line and add 'ftrace_dump_on_oops' as well, so that the trace
> gets dumped with the rcu stall splat?
> 
> Thanks,
> 
>        tglx
> 
> 
>