[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGxU2F7CjNu5Wxg3k1hQF8A8uRt-wKLjMW6TMjb+UVCF+MHZbw@mail.gmail.com>
Date: Thu, 17 Feb 2022 10:48:18 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Jason Wang <jasowang@...hat.com>,
syzbot <syzbot+1e3ea63db39f2b4440e0@...kaller.appspotmail.com>,
kvm <kvm@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>, syzkaller-bugs@...glegroups.com,
virtualization <virtualization@...ts.linux-foundation.org>,
Stefan Hajnoczi <stefanha@...hat.com>
Subject: Re: [syzbot] WARNING in vhost_dev_cleanup (2)
On Thu, Feb 17, 2022 at 8:50 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Thu, Feb 17, 2022 at 03:39:48PM +0800, Jason Wang wrote:
> > On Thu, Feb 17, 2022 at 3:36 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Thu, Feb 17, 2022 at 03:34:13PM +0800, Jason Wang wrote:
> > > > On Thu, Feb 17, 2022 at 10:01 AM syzbot
> > > > <syzbot+1e3ea63db39f2b4440e0@...kaller.appspotmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit: c5d9ae265b10 Merge tag 'for-linus' of git://git.kernel.org..
> > > > > git tree: upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=132e687c700000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=1e3ea63db39f2b4440e0
> > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+1e3ea63db39f2b4440e0@...kaller.appspotmail.com
> > > > >
> > > > > WARNING: CPU: 1 PID: 10828 at drivers/vhost/vhost.c:715 vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
> > > > > Modules linked in:
> > > > > CPU: 0 PID: 10828 Comm: syz-executor.0 Not tainted 5.17.0-rc4-syzkaller-00051-gc5d9ae265b10 #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > > > RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
> > > >
> > > > Probably a hint that we are missing a flush.
> > > >
> > > > Looking at vhost_vsock_stop() that is called by vhost_vsock_dev_release():
> > > >
> > > > static int vhost_vsock_stop(struct vhost_vsock *vsock)
> > > > {
> > > > size_t i;
> > > > int ret;
> > > >
> > > > mutex_lock(&vsock->dev.mutex);
> > > >
> > > > ret = vhost_dev_check_owner(&vsock->dev);
> > > > if (ret)
> > > > goto err;
> > > >
> > > > Where it could fail so the device is not actually stopped.
> > > >
> > > > I wonder if this is something related.
> > > >
> > > > Thanks
> > >
> > >
> > > But then if that is not the owner then no work should be running, right?
> >
> > Could it be a buggy user space that passes the fd to another process
> > and changes the owner just before the mutex_lock() above?
> >
> > Thanks
>
> Maybe, but can you be a bit more explicit? what is the set of
> conditions you see that can lead to this?
I think the issue could be in the vhost_vsock_stop() as Jason mentioned,
but not related to fd passing, but related to the do_exit() function.
Looking the stack trace, we are in exit_task_work(), that is called
after exit_mm(), so the vhost_dev_check_owner() can fail because
current->mm should be NULL at that point.
It seems the fput work is queued by fput_many() in a worker queue, and
in some cases (maybe a lot of files opened?) the work is still queued
when we enter in do_exit().
That said, I don't know if we can simply remove that check in
vhost_vsock_stop(), or check if current->mm is NULL, to understand if
the process is exiting.
Stefano
Powered by blists - more mailing lists