[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918110101-mutt-send-email-mst@kernel.org>
Date: Thu, 18 Sep 2025 11:02:09 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: syzbot <syzbot+a1a3cefd6148c781117c@...kaller.appspotmail.com>,
linux-kernel@...r.kernel.org, luto@...nel.org, peterz@...radead.org,
syzkaller-bugs@...glegroups.com, tglx@...utronix.de
Subject: Re: [syzbot] [kernel?] WARNING in __vhost_task_wake
On Thu, Sep 18, 2025 at 07:56:48AM -0700, Sean Christopherson wrote:
> +Michael
>
> Michael, this is the VHOST_TASK_FLAGS_KILLED WARN that was added[*] to detect
> violations similar to KVM.
>
> /*
> * Checking VHOST_TASK_FLAGS_KILLED can race with signal delivery, but
> * a race can only result in false negatives and this is just a sanity
> * check, i.e. if KILLED is set, the caller is buggy no matter what.
> */
> if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)))
> return;
>
> I haven't been able to repro the splat, but after much staring I think the issue
> is that vhost_task_fn() marks the task KILLED before invoking handle_sigkill().
> If vhost_worker_flush() already holds worker->mutex, before vhost_worker_killed()
> runs, then it could wake a (not yet dead) task that has KILLED set.
>
> Assuming waiting to set KILLED until after handle_sigkill() resolves the issue
> (fingers crossed), the two options I see would be to apply the below as fixup,
> or simply drop the sanity check for the 6.17 and add it back in 6.18 in conjunction
> with the below (again, assuming it actually resolves the issue).
>
> [*] https://lore.kernel.org/all/20250827194107.4142164-2-seanjc@google.com
I just sent this one to Linus. Enough?
> On Wed, Sep 17, 2025, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: ae2d20002576 Add linux-next specific files for 20250917
> > git tree: linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11678f62580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=d737cfaddae0058c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a1a3cefd6148c781117c
> > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1790ef62580000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10242534580000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/96197382e3c0/disk-ae2d2000.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/55a8a6ba3307/vmlinux-ae2d2000.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/c1b4ed5d6e2c/bzImage-ae2d2000.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a1a3cefd6148c781117c@...kaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: kernel/vhost_task.c:97 at __vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97, CPU#0: syz.0.174/6507
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 6507 Comm: syz.0.174 Not tainted syzkaller #0 PREEMPT(full)
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
> > RIP: 0010:__vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97
> > Code: 38 00 74 08 48 89 df e8 93 81 95 00 48 8b 3b 5b 41 5e 41 5f e9 a6 45 01 00 e8 31 ef 30 00 90 0f 0b 90 eb 8b e8 26 ef 30 00 90 <0f> 0b 90 5b 41 5e 41 5f e9 18 c7 ff 09 cc 0f 1f 80 00 00 00 00 90
> > RSP: 0018:ffffc90003b7f680 EFLAGS: 00010293
> > RAX: ffffffff818f2d7a RBX: ffff888033c7c400 RCX: ffff88802bc85ac0
> > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
> > RBP: ffffc90003b7f750 R08: ffff888033c7c477 R09: 1ffff1100678f88e
> > R10: dffffc0000000000 R11: ffffed100678f88f R12: 1ffff9200076fed8
> > R13: dffffc0000000000 R14: 0000000000000002 R15: dffffc0000000000
> > FS: 0000000000000000(0000) GS:ffff88812579c000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f9942aa0f98 CR3: 0000000027bfc000 CR4: 00000000003526f0
> > Call Trace:
> > <TASK>
> > vhost_worker_queue+0x194/0x260 drivers/vhost/vhost.c:253
> > __vhost_worker_flush+0x134/0x1e0 drivers/vhost/vhost.c:290
> > vhost_worker_flush drivers/vhost/vhost.c:303 [inline]
> > vhost_dev_flush+0xb2/0x130 drivers/vhost/vhost.c:313
> > vhost_vsock_flush drivers/vhost/vsock.c:698 [inline]
> > vhost_vsock_dev_release+0x1fb/0x3f0 drivers/vhost/vsock.c:750
> > __fput+0x44c/0xa70 fs/file_table.c:468
> > task_work_run+0x1d4/0x260 kernel/task_work.c:227
> > exit_task_work include/linux/task_work.h:40 [inline]
> > do_exit+0x6b5/0x2300 kernel/exit.c:966
> > do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
> > get_signal+0x1285/0x1340 kernel/signal.c:3034
> > arch_do_signal_or_restart+0xa0/0x790 arch/x86/kernel/signal.c:337
> > exit_to_user_mode_loop+0x72/0x130 kernel/entry/common.c:40
> > exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
> > syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
> > syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
> > do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f9941b8eba9
> > Code: Unable to access opcode bytes at 0x7f9941b8eb7f.
> > RSP: 002b:00007f9942aa10e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > RAX: fffffffffffffe00 RBX: 00007f9941dd5fa8 RCX: 00007f9941b8eba9
> > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f9941dd5fa8
> > RBP: 00007f9941dd5fa0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007f9941dd6038 R14: 00007ffd2989c130 R15: 00007ffd2989c218
> > </TASK>
> >
> >
> > ---
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
>
> #syz test
>
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index 01bf7b0e2c5b..6cb3b8b26768 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -58,9 +58,15 @@ static int vhost_task_fn(void *data)
> * new work and flushed.
> */
> if (!test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
> - set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
> if (vtsk->handle_sigkill)
> vtsk->handle_sigkill(vtsk->data);
> +
> + /*
> + * Mark the task KILLED *after* giving the owner the chance to
> + * handle SIGKILL to avoid false positives on the sanity check
> + * in __vhost_task_wake().
> + */
> + set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
> }
> mutex_unlock(&vtsk->exit_mutex);
> complete(&vtsk->exited);
Powered by blists - more mailing lists