[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAOU40uD=Ry0iOj-0X8DQeEz3avvk=a0k+zb5upGhYRfd3FhSKQ@mail.gmail.com>
Date: Thu, 20 Nov 2025 14:48:33 +0800
From: Xianying Wang <wangxianying546@...il.com>
To: pbonzini@...hat.com
Cc: vkuznets@...hat.com, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
"tglx@...utronix.de" <tglx@...utronix.de>
Subject: BUG: soft lockup in smp_call_function
Hi,
I hit a repeatable soft lockup in csd_lock_wait() via
smp_call_function_many_cond() while running a KVM guest with a
syzkaller workload. This soft lockup can be triggered by running the
attached C reproducer inside a KVM guest for some time. The reproducer
just loops perf_event_open() + ioctl(PERF_EVENT_IOC_REFRESH) +
socket(AF_INET6, ...) in a child process, while normal userspace
(systemd/journald) is running.This may be a soft lockup caused by an
incomplete cross-CPU TLB flush (smp_call_function_many_cond /
csd_lock_wait). The lockup occurs in csd_lock_wait() in kernel/smp.c
(inlined into smp_call_function_many_cond()), with the upper call
chain being flush_tlb_mm_range() → kvm_flush_tlb_multi(), triggered by
an ext4 fsync().
Since this is a KVM guest and syzkaller typically does a lot of
stressing, it looks like a possible race between kvm_flush_tlb_multi()
and CPU state (e.g. CPU hotplug / vCPU offlining or an incorrect
cpumask) in the paravirt TLB shootdown path, where one target CPU
never processes the IPI.
This can be reproduced on:
HEAD commit:
e5f0a698b34ed76002dc5cff3804a61c80233a7a
6fab32bb6508abbb8b7b1c5498e44f0c32320ed5
report: https://pastebin.com/raw/Lu4Tz2SH
console output :https://pastebin.com/raw/BxtNEXnq
console output v6.17.0:https://pastebin.com/raw/PBytK7Wq
kernel config : https://pastebin.com/raw/1grwrT16
C reproducer :https://pastebin.com/raw/ySCpMzk2
Let me know if you need more details or testing.
Best regards,
Xianying
Powered by blists - more mailing lists