linux-kernel - Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220302153643.glkmvnn2czrgpoyl@sgarzare-redhat>
Date:   Wed, 2 Mar 2022 16:36:43 +0100
From:   Stefano Garzarella <sgarzare@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Lee Jones <lee.jones@...aro.org>, jasowang@...hat.com,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        stable@...r.kernel.org,
        syzbot+adc3cb32385586bec859@...kaller.appspotmail.com
Subject: Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared
 whilst still in use

On Wed, Mar 02, 2022 at 09:50:38AM -0500, Michael S. Tsirkin wrote:
>On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:
>> On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
>> > On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
>> > > On Wed, Mar 02, 2022 at 07:54:21AM +0000, Lee Jones wrote:
>> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
>> > > > to vhost_get_vq_desc().  All we have to do is take the same lock
>> > > > during virtqueue clean-up and we mitigate the reported issues.
>> > > >
>> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
>> > >
>> > > This issue is similar to [1] that should be already fixed upstream by [2].
>> > >
>> > > However I think this patch would have prevented some issues, because
>> > > vhost_vq_reset() sets vq->private to NULL, preventing the worker from
>> > > running.
>> > >
>> > > Anyway I think that when we enter in vhost_dev_cleanup() the worker should
>> > > be already stopped, so it shouldn't be necessary to take the mutex. But in
>> > > order to prevent future issues maybe it's better to take them, so:
>> > >
>> > > Reviewed-by: Stefano Garzarella <sgarzare@...hat.com>
>> > >
>> > > [1]
>> > > https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
>> > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
>> >
>> >
>> > Right. I want to queue this but I would like to get a warning
>> > so we can detect issues like [2] before they cause more issues.
>>
>> I agree, what about moving the warning that we already have higher up, right
>> at the beginning of the function?
>>
>> I mean something like this:
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 59edb5a1ffe2..1721ff3f18c0 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>>  {
>>         int i;
>> +       WARN_ON(!llist_empty(&dev->work_list));
>> +
>>         for (i = 0; i < dev->nvqs; ++i) {
>>                 if (dev->vqs[i]->error_ctx)
>>                         eventfd_ctx_put(dev->vqs[i]->error_ctx);
>> @@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>>         dev->iotlb = NULL;
>>         vhost_clear_msg(dev);
>>         wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
>> -       WARN_ON(!llist_empty(&dev->work_list));
>>         if (dev->worker) {
>>                 kthread_stop(dev->worker);
>>                 dev->worker = NULL;
>>
>
>Hmm I'm not sure why it matters.

Because after this new patch, putting locks in the while loop, when we 
finish the loop the workers should be stopped, because vhost_vq_reset() 
sets vq->private to NULL.

But the best thing IMHO is to check that there is no backend set for 
each vq, so the workers have been stopped correctly at this point.

Thanks,
Stefano