[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba4dbc25-f912-fb34-a0e2-c6c85b34b918@yandex-team.com>
Date: Mon, 22 Nov 2021 18:12:59 +0300
From: Andrey Ryabinin <arbn@...dex-team.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Jason Wang <jasowang@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
kvm <kvm@...r.kernel.org>,
virtualization <virtualization@...ts.linux-foundation.org>,
netdev <netdev@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>, bpf@...r.kernel.org
Subject: Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of
synchronize_rcu()
On 11/22/21 12:37 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 19, 2021 at 02:32:05PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 11/16/21 8:00 AM, Jason Wang wrote:
>>> On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@...dex-team.com> wrote:
>>>>
>>>> Currently vhost_net_release() uses synchronize_rcu() to synchronize
>>>> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
>>>> is quite costly operation. It take more than 10 seconds
>>>> to shutdown qemu launched with couple net devices like this:
>>>> -netdev tap,id=tap0,..,vhost=on,queues=80
>>>> because we end up calling synchronize_rcu() netdev_count*queues times.
>>>>
>>>> Free vhost net structures in rcu callback instead of using
>>>> synchronize_rcu() to fix the problem.
>>>
>>> I admit the release code is somehow hard to understand. But I wonder
>>> if the following case can still happen with this:
>>>
>>> CPU 0 (vhost_dev_cleanup) CPU1
>>> (vhost_net_zerocopy_callback()->vhost_work_queue())
>>> if (!dev->worker)
>>> dev->worker = NULL
>>>
>>> wake_up_process(dev->worker)
>>>
>>> If this is true. It seems the fix is to move RCU synchronization stuff
>>> in vhost_net_ubuf_put_and_wait()?
>>>
>>
>> It all depends whether vhost_zerocopy_callback() can be called outside of vhost
>> thread context or not. If it can run after vhost thread stopped, than the race you
>> describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a theoretical race in device cleanup")
>> wasn't complete. I would fix it by calling synchronize_rcu() after vhost_net_flush()
>> and before vhost_dev_cleanup().
>>
>> As for the performance problem, it can be solved by replacing synchronize_rcu() with synchronize_rcu_expedited().
>
> expedited causes a stop of IPIs though, so it's problematic to
> do it upon a userspace syscall.
>
How about something like this?
---
drivers/vhost/net.c | 40 ++++++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 97a209d6a527..556df26c584d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,6 +144,10 @@ struct vhost_net {
struct page_frag page_frag;
/* Refcount bias of page frag */
int refcnt_bias;
+
+ struct socket *tx_sock;
+ struct socket *rx_sock;
+ struct rcu_work rwork;
};
static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -1389,6 +1393,24 @@ static void vhost_net_flush(struct vhost_net *n)
}
}
+static void vhost_net_cleanup(struct work_struct *work)
+{
+ struct vhost_net *n =
+ container_of(to_rcu_work(work), struct vhost_net, rwork);
+ vhost_dev_cleanup(&n->dev);
+ vhost_net_vq_reset(n);
+ if (n->tx_sock)
+ sockfd_put(n->tx_sock);
+ if (n->rx_sock)
+ sockfd_put(n->rx_sock);
+ kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
+ kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
+ kfree(n->dev.vqs);
+ if (n->page_frag.page)
+ __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
+ kvfree(n);
+}
+
static int vhost_net_release(struct inode *inode, struct file *f)
{
struct vhost_net *n = f->private_data;
@@ -1398,21 +1420,11 @@ static int vhost_net_release(struct inode *inode, struct file *f)
vhost_net_stop(n, &tx_sock, &rx_sock);
vhost_net_flush(n);
vhost_dev_stop(&n->dev);
- vhost_dev_cleanup(&n->dev);
- vhost_net_vq_reset(n);
- if (tx_sock)
- sockfd_put(tx_sock);
- if (rx_sock)
- sockfd_put(rx_sock);
- /* Make sure no callbacks are outstanding */
- synchronize_rcu();
+ n->tx_sock = tx_sock;
+ n->rx_sock = rx_sock;
- kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
- kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
- kfree(n->dev.vqs);
- if (n->page_frag.page)
- __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
- kvfree(n);
+ INIT_RCU_WORK(&n->rwork, vhost_net_cleanup);
+ queue_rcu_work(system_wq, &n->rwork);
return 0;
}
--
Powered by blists - more mailing lists