netdev - Re: [PATCH v7 24/26] virtio_net: support rx/tx queue reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1646820327.1766295-14-xuanzhuo@linux.alibaba.com>
Date:   Wed, 9 Mar 2022 18:05:27 +0800
From:   Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Jeff Dike <jdike@...toit.com>, Richard Weinberger <richard@....at>,
        Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Hans de Goede <hdegoede@...hat.com>,
        Mark Gross <markgross@...nel.org>,
        Vadim Pasternak <vadimp@...dia.com>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Mathieu Poirier <mathieu.poirier@...aro.org>,
        Cornelia Huck <cohuck@...hat.com>,
        Halil Pasic <pasic@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Johannes Berg <johannes.berg@...el.com>,
        Vincent Whitchurch <vincent.whitchurch@...s.com>,
        linux-um@...ts.infradead.org, platform-driver-x86@...r.kernel.org,
        linux-remoteproc@...r.kernel.org, linux-s390@...r.kernel.org,
        kvm@...r.kernel.org, bpf@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [PATCH v7 24/26] virtio_net: support rx/tx queue reset

On Wed, 9 Mar 2022 17:14:34 +0800, Jason Wang <jasowang@...hat.com> wrote:
>
> 在 2022/3/8 下午8:35, Xuan Zhuo 写道:
> > This patch implements the reset function of the rx, tx queues.
> >
> > Based on this function, it is possible to modify the ring num of the
> > queue. And quickly recycle the buffer in the queue.
> >
> > In the process of the queue disable, in theory, as long as virtio
> > supports queue reset, there will be no exceptions.
> >
> > However, in the process of the queue enable, there may be exceptions due to
> > memory allocation.  In this case, vq is not available, but we still have
> > to execute napi_enable(). Because napi_disable is similar to a lock,
> > napi_enable must be called after calling napi_disable.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 107 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 409a8e180918..ffff323dcef0 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -251,6 +251,11 @@ struct padded_vnet_hdr {
> >   	char padding[4];
> >   };
> >
> > +static void virtnet_sq_free_unused_bufs(struct virtnet_info *vi,
> > +					struct send_queue *sq);
> > +static void virtnet_rq_free_unused_bufs(struct virtnet_info *vi,
> > +					struct receive_queue *rq);
> > +
> >   static bool is_xdp_frame(void *ptr)
> >   {
> >   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > @@ -1369,6 +1374,9 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> >   {
> >   	napi_enable(napi);
> >
> > +	if (vq->reset)
> > +		return;
> > +
>
>
> Let's WARN_ONCE() here?
>
>
> >   	/* If all buffers were filled by other side before we napi_enabled, we
> >   	 * won't get another interrupt, so process any outstanding packets now.
> >   	 * Call local_bh_enable after to trigger softIRQ processing.
> > @@ -1413,6 +1421,10 @@ static void refill_work(struct work_struct *work)
> >   		struct receive_queue *rq = &vi->rq[i];
> >
> >   		napi_disable(&rq->napi);
> > +		if (rq->vq->reset) {
> > +			virtnet_napi_enable(rq->vq, &rq->napi);
> > +			continue;
> > +		}
>
>
> This seems racy and it's a hint that we need sync with the refill work
> during reset like what we did in virtnet_close():
>
>          /* Make sure refill_work doesn't re-enable napi! */
>          cancel_delayed_work_sync(&vi->refill);
>
>
> >   		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> >   		virtnet_napi_enable(rq->vq, &rq->napi);
> >
> > @@ -1523,6 +1535,9 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   	if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> >   		return;
> >
> > +	if (sq->vq->reset)
> > +		return;
>
>
> It looks to me we'd better either WARN or just remove this. Since it
> looks like a workaround for the un-synchronized NAPI somehow.
>

During the reset process, both ring reset and enable may fail. In the case of
failure, vq will be unavailable. All three cases prevent this situation.

Even if it fails, napi still needs to be enabled. This is to prevent
napi_disable from being stuck when the network card is closed.


So the first and second cases above are that napi is enabled, but vq has not
been reset successfully or is still in reset.

And the third case is to deal with tx in reset, and rx is in working state, then
here will access the vq of sq.




>
> > +
> >   	if (__netif_tx_trylock(txq)) {
> >   		do {
> >   			virtqueue_disable_cb(sq->vq);
> > @@ -1769,6 +1784,98 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	return NETDEV_TX_OK;
> >   }
> >
> > +static int virtnet_rx_vq_reset(struct virtnet_info *vi,
> > +			       struct receive_queue *rq, u32 ring_num)
>
>
> It's better to rename this as virtnet_rx_resize().


I don't think resize is good enough, because I think resize is an effect of
reset. Inside af_xdp, we will call it just to reset to free the buffer without
resize with ring_num == 0.

So virtnet_rx_reset() might be better.

>
>
> > +{
> > +	int err;
> > +
> > +	/* stop napi */
> > +	napi_disable(&rq->napi);
> > +
>
>
> Here, as discussed above, we need synchronize with the refill work.
>
>
> > +	/* reset the queue */
> > +	err = virtio_reset_vq(rq->vq);
> > +	if (err)
> > +		goto err;
>
>
> Btw, most comment of this function seems useless since code already
> explain themselves.

OK, I will remove these.

>
>
> > +
> > +	/* free bufs */
> > +	virtnet_rq_free_unused_bufs(vi, rq);
> > +
> > +	/* reset vring. */
> > +	err = virtqueue_reset_vring(rq->vq, ring_num);
> > +	if (err)
> > +		goto err;
> > +
> > +	/* enable reset queue */
> > +	err = virtio_enable_resetq(rq->vq);
> > +	if (err)
> > +		goto err;
> > +
> > +	/* fill recv */
> > +	if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > +		schedule_delayed_work(&vi->refill, 0);
> > +
> > +	/* enable napi */
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return 0;
> > +
> > +err:
> > +	netdev_err(vi->dev,
> > +		   "reset rx reset vq fail: rx queue index: %ld err: %d\n",
> > +		   rq - vi->rq, err);
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return err;
> > +}
> > +
> > +static int virtnet_tx_vq_reset(struct virtnet_info *vi,
> > +			       struct send_queue *sq, u32 ring_num)
> > +{
>
>
> It looks to me it's better to rename this as "virtnet_rx_resize()"
>
>
> > +	struct netdev_queue *txq;
> > +	int err, qindex;
> > +
> > +	qindex = sq - vi->sq;
> > +
> > +	txq = netdev_get_tx_queue(vi->dev, qindex);
> > +	__netif_tx_lock_bh(txq);
> > +
> > +	/* stop tx queue and napi */
> > +	netif_stop_subqueue(vi->dev, qindex);
> > +	virtnet_napi_tx_disable(&sq->napi);
>
>
> There's no need to hold tx lock for napi disable.

tx lock 的主要目的是等待其它的 xmit 调用结束.
并设置 netif_stop_subqueue()

The main purpose of tx lock is to wait for other xmit calls to end. And set
netif_stop_subqueue()

Thanks.

>
> Thanks
>
>
> > +
> > +	__netif_tx_unlock_bh(txq);
> > +
> > +	/* reset the queue */
> > +	err = virtio_reset_vq(sq->vq);
> > +	if (err) {
> > +		netif_start_subqueue(vi->dev, qindex);
> > +		goto err;
> > +	}
> > +
> > +	/* free bufs */
> > +	virtnet_sq_free_unused_bufs(vi, sq);
> > +
> > +	/* reset vring. */
> > +	err = virtqueue_reset_vring(sq->vq, ring_num);
> > +	if (err)
> > +		goto err;
> > +
> > +	/* enable reset queue */
> > +	err = virtio_enable_resetq(sq->vq);
> > +	if (err)
> > +		goto err;
> > +
> > +	/* start tx queue and napi */
> > +	netif_start_subqueue(vi->dev, qindex);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return 0;
> > +
> > +err:
> > +	netdev_err(vi->dev,
> > +		   "reset tx reset vq fail: tx queue index: %ld err: %d\n",
> > +		   sq - vi->sq, err);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return err;
> > +}
> > +
> >   /*
> >    * Send command via the control virtqueue and check status.  Commands
> >    * supported by the hypervisor, as indicated by feature bits, should
>