[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMwdsFGkM-tMjHwc@google.com>
Date: Thu, 18 Sep 2025 07:56:48 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: syzbot <syzbot+a1a3cefd6148c781117c@...kaller.appspotmail.com>
Cc: linux-kernel@...r.kernel.org, luto@...nel.org, peterz@...radead.org,
syzkaller-bugs@...glegroups.com, tglx@...utronix.de,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [syzbot] [kernel?] WARNING in __vhost_task_wake
+Michael
Michael, this is the VHOST_TASK_FLAGS_KILLED WARN that was added[*] to detect
violations similar to KVM.
/*
* Checking VHOST_TASK_FLAGS_KILLED can race with signal delivery, but
* a race can only result in false negatives and this is just a sanity
* check, i.e. if KILLED is set, the caller is buggy no matter what.
*/
if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)))
return;
I haven't been able to repro the splat, but after much staring I think the issue
is that vhost_task_fn() marks the task KILLED before invoking handle_sigkill().
If vhost_worker_flush() already holds worker->mutex, before vhost_worker_killed()
runs, then it could wake a (not yet dead) task that has KILLED set.
Assuming waiting to set KILLED until after handle_sigkill() resolves the issue
(fingers crossed), the two options I see would be to apply the below as fixup,
or simply drop the sanity check for the 6.17 and add it back in 6.18 in conjunction
with the below (again, assuming it actually resolves the issue).
[*] https://lore.kernel.org/all/20250827194107.4142164-2-seanjc@google.com
On Wed, Sep 17, 2025, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: ae2d20002576 Add linux-next specific files for 20250917
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=11678f62580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=d737cfaddae0058c
> dashboard link: https://syzkaller.appspot.com/bug?extid=a1a3cefd6148c781117c
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1790ef62580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10242534580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/96197382e3c0/disk-ae2d2000.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/55a8a6ba3307/vmlinux-ae2d2000.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/c1b4ed5d6e2c/bzImage-ae2d2000.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a1a3cefd6148c781117c@...kaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: kernel/vhost_task.c:97 at __vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97, CPU#0: syz.0.174/6507
> Modules linked in:
> CPU: 0 UID: 0 PID: 6507 Comm: syz.0.174 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
> RIP: 0010:__vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97
> Code: 38 00 74 08 48 89 df e8 93 81 95 00 48 8b 3b 5b 41 5e 41 5f e9 a6 45 01 00 e8 31 ef 30 00 90 0f 0b 90 eb 8b e8 26 ef 30 00 90 <0f> 0b 90 5b 41 5e 41 5f e9 18 c7 ff 09 cc 0f 1f 80 00 00 00 00 90
> RSP: 0018:ffffc90003b7f680 EFLAGS: 00010293
> RAX: ffffffff818f2d7a RBX: ffff888033c7c400 RCX: ffff88802bc85ac0
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
> RBP: ffffc90003b7f750 R08: ffff888033c7c477 R09: 1ffff1100678f88e
> R10: dffffc0000000000 R11: ffffed100678f88f R12: 1ffff9200076fed8
> R13: dffffc0000000000 R14: 0000000000000002 R15: dffffc0000000000
> FS: 0000000000000000(0000) GS:ffff88812579c000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f9942aa0f98 CR3: 0000000027bfc000 CR4: 00000000003526f0
> Call Trace:
> <TASK>
> vhost_worker_queue+0x194/0x260 drivers/vhost/vhost.c:253
> __vhost_worker_flush+0x134/0x1e0 drivers/vhost/vhost.c:290
> vhost_worker_flush drivers/vhost/vhost.c:303 [inline]
> vhost_dev_flush+0xb2/0x130 drivers/vhost/vhost.c:313
> vhost_vsock_flush drivers/vhost/vsock.c:698 [inline]
> vhost_vsock_dev_release+0x1fb/0x3f0 drivers/vhost/vsock.c:750
> __fput+0x44c/0xa70 fs/file_table.c:468
> task_work_run+0x1d4/0x260 kernel/task_work.c:227
> exit_task_work include/linux/task_work.h:40 [inline]
> do_exit+0x6b5/0x2300 kernel/exit.c:966
> do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
> get_signal+0x1285/0x1340 kernel/signal.c:3034
> arch_do_signal_or_restart+0xa0/0x790 arch/x86/kernel/signal.c:337
> exit_to_user_mode_loop+0x72/0x130 kernel/entry/common.c:40
> exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
> syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
> syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
> do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f9941b8eba9
> Code: Unable to access opcode bytes at 0x7f9941b8eb7f.
> RSP: 002b:00007f9942aa10e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00007f9941dd5fa8 RCX: 00007f9941b8eba9
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f9941dd5fa8
> RBP: 00007f9941dd5fa0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f9941dd6038 R14: 00007ffd2989c130 R15: 00007ffd2989c218
> </TASK>
>
>
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
#syz test
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index 01bf7b0e2c5b..6cb3b8b26768 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -58,9 +58,15 @@ static int vhost_task_fn(void *data)
* new work and flushed.
*/
if (!test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
- set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
if (vtsk->handle_sigkill)
vtsk->handle_sigkill(vtsk->data);
+
+ /*
+ * Mark the task KILLED *after* giving the owner the chance to
+ * handle SIGKILL to avoid false positives on the sanity check
+ * in __vhost_task_wake().
+ */
+ set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
}
mutex_unlock(&vtsk->exit_mutex);
complete(&vtsk->exited);
Powered by blists - more mailing lists