[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a71abaa5-478b-4fb7-9015-abe5f10909ef@oracle.com>
Date: Wed, 7 Jan 2026 12:07:24 -0600
From: Mike Christie <michael.christie@...cle.com>
To: "Michael S. Tsirkin" <mst@...hat.com>, Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+a9528028ab4ca83e8bac@...kaller.appspotmail.com>,
eperezma@...hat.com, jasowang@...hat.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
syzkaller-bugs@...glegroups.com, virtualization@...ts.linux.dev
Subject: Re: [syzbot] INFO: task hung in vhost_worker_killed (2)
On 1/6/26 1:57 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 06, 2026 at 09:46:30AM +0800, Hillf Danton wrote:
>>> taking vq mutex in a kill handler is probably not wise.
>>> we should have a separate lock just for handling worker
>>> assignment.
>>>
>> Better not before showing us the root cause of the hung to
>> avoid adding a blind lock.
>
> Well I think it's pretty clear but the issue is that just another lock
> is not enough, we have bigger problems with this mutex.
>
> It's held around userspace accesses so if the vhost thread gets into
> uninterruptible sleep holding that, a userspace thread trying to take it
> with mutex_lock will be uninterruptible.
>
> So it propagates the uninterruptible status between vhost and a
> userspace thread.
>
> It's not a new issue but the new(ish) thread management APIs make
> it more visible.
>
> Here it's the kill handler that got hung but it's not really limited
> to that, any ioctl can do that, and I do not want to add another
> lock on data path.
>
Above, are you saying that the kill handler and a ioctl are trying
to take the virtqueue->mutex in this bug?
I've been trying to replicate this for a while, but I just can't hit what
I'm seeing in the lockdep info from the initial email. We only see the
kill handler trying to take the virtqueue->mutex. Is the theory that the
locking info being reported is not correct? A userspace thread is doing
an ioctl that took the mutex but it's not reported below?
Originally I was using the vhost_dev->mutex for the locking in vhost_worker_killed
but I saw we could take that during ioctls which did a flush, so I added the
vhost_worker->mutex for some of the locking.
If the virtqueue->mutex is also an issue I can do a patch.
Showing all locks held in the system:
1 lock held by khungtaskd/32:
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
2 locks held by getty/5579:
#0: ffff88814e3cb0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc9000332b2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
1 lock held by syz-executor/5978:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:311 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x2b1/0x6e0 kernel/rcu/tree_exp.h:956
2 locks held by syz.5.259/7601:
3 locks held by vhost-7617/7618:
#0: ffff888054cc68e8 (&vtsk->exit_mutex){+.+.}-{4:4}, at: vhost_task_fn+0x322/0x430 kernel/vhost_task.c:54
#1: ffff888024646a80 (&worker->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x57/0x390 drivers/vhost/vhost.c:470
#2: ffff8880550c0258 (&vq->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x12b/0x390 drivers/vhost/vhost.c:476
1 lock held by syz-executor/7850:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:343 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x36e/0x6e0 kernel/rcu/tree_exp.h:956
1 lock held by syz.2.640/9940:
4 locks held by syz.3.641/9946:
3 locks held by syz.1.642/9954:
Powered by blists - more mailing lists