[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d72fcbad-e9b0-c0ed-86e3-30723dae2c74@redhat.com>
Date: Wed, 19 Jul 2017 10:35:12 +0800
From: Jason Wang <jasowang@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 4/5] virtio-net: do not reset during XDP set
On 2017年07月19日 03:49, Michael S. Tsirkin wrote:
> On Mon, Jul 17, 2017 at 08:44:00PM +0800, Jason Wang wrote:
>> We used to reset during XDP set, the main reason is we need allocate
>> extra headroom for header adjustment but there's no way to know the
>> headroom of exist receive buffer. This works buy maybe complex and may
>> cause the network down for a while which is bad for user
>> experience. So this patch tries to avoid this by:
>>
>> - packing headroom into receive buffer ctx
>> - check the headroom during XDP, and if it was not sufficient, copy
>> the packet into a location which has a large enough headroom
> The packing is actually done by previous patches. Here is a
> corrected version:
>
> We currently reset the device during XDP set, the main reason is
> that we allocate more headroom with XDP (for header adjustment).
>
> This works but causes network downtime for users.
>
> Previous patches encoded the headroom in the buffer context,
> this makes it possible to detect the case where a buffer
> with headroom insufficient for XDP is added to the queue and
> XDP is enabled afterwards.
>
> Upon detection, we handle this case by copying the packet
> (slow, but it's a temporary condition).
Ok.
>
>
>> Signed-off-by: Jason Wang <jasowang@...hat.com>
>> ---
>> drivers/net/virtio_net.c | 230 ++++++++++++++++++++++-------------------------
>> 1 file changed, 105 insertions(+), 125 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index e31b5b2..e732bd6 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -407,6 +407,67 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>> return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
>> }
>>
>> +/* We copy and linearize packet in the following cases:
>> + *
>> + * 1) Packet across multiple buffers, this happens normally when rx
>> + * buffer size is underestimated. Rarely, since spec does not
>> + * forbid using more than one buffer even if a single buffer is
>> + * sufficient for the packet, we should also deal with this case.
> Latest SVN of the spec actually forbids this. See:
> net: clarify device rules for mergeable buffers
Good to know this.
>
>
>> + * 2) The header room is smaller than what XDP required. In this case
>> + * we should copy the packet and reserve enough headroom for this.
>> + * This would be slow but we at most we can copy times of queue
>> + * size, this is acceptable. What's more important, this help to
>> + * avoid resetting.
> Last part of the comment applies to both cases. So
>
> +/* We copy the packet for XDP in the following cases:
> + *
> + * 1) Packet is scattered across multiple rx buffers.
> + * 2) Headroom space is insufficient.
> + *
> + * This is inefficient but it's a temporary condition that
> + * we hit right after XDP is enabled and until queue is refilled
> + * with large buffers with sufficient headroom - so it should affect
> + * at most queue size packets.
>
> + * Afterwards, the conditions to enable
> + * XDP should preclude the underlying device from sending packets
> + * across multiple buffers (num_buf > 1), and we make sure buffers
> + * have enough headroom.
> + */
>
Ok.
>
>> + * 2) The header room is smaller than what XDP required. In this case
>> + * we should copy the packet and reserve enough headroom for this.
>> + * This would be slow but we at most we can copy times of queue
>> + * size, this is acceptable. What's more important, this help to
>> + * avoid resetting.
>
>
>> + */
>> +static struct page *xdp_linearize_page(struct receive_queue *rq,
>> + u16 *num_buf,
>> + struct page *p,
>> + int offset,
>> + int page_off,
>> + unsigned int *len)
>> +{
>> + struct page *page = alloc_page(GFP_ATOMIC);
>> +
>> + if (!page)
>> + return NULL;
>> +
>> + memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
>> + page_off += *len;
>> +
>> + while (--*num_buf) {
>> + unsigned int buflen;
>> + void *buf;
>> + int off;
>> +
>> + buf = virtqueue_get_buf(rq->vq, &buflen);
>> + if (unlikely(!buf))
>> + goto err_buf;
>> +
>> + p = virt_to_head_page(buf);
>> + off = buf - page_address(p);
>> +
>> + /* guard against a misconfigured or uncooperative backend that
>> + * is sending packet larger than the MTU.
>> + */
>> + if ((page_off + buflen) > PAGE_SIZE) {
>> + put_page(p);
>> + goto err_buf;
>> + }
>> +
>> + memcpy(page_address(page) + page_off,
>> + page_address(p) + off, buflen);
>> + page_off += buflen;
>> + put_page(p);
>> + }
>> +
>> + /* Headroom does not contribute to packet length */
>> + *len = page_off - VIRTIO_XDP_HEADROOM;
>> + return page;
>> +err_buf:
>> + __free_pages(page, 0);
>> + return NULL;
>> +}
>> +
>> static struct sk_buff *receive_small(struct net_device *dev,
>> struct virtnet_info *vi,
>> struct receive_queue *rq,
>> @@ -415,12 +476,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>> {
>> struct sk_buff *skb;
>> struct bpf_prog *xdp_prog;
>> - unsigned int xdp_headroom = virtnet_get_headroom(vi);
>> + unsigned int xdp_headroom = (unsigned long)ctx;
>> unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
>> unsigned int headroom = vi->hdr_len + header_offset;
>> unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> + struct page *page = virt_to_head_page(buf);
>> unsigned int delta = 0;
>> + struct page *xdp_page;
>> len -= vi->hdr_len;
>>
>> rcu_read_lock();
>> @@ -434,6 +497,27 @@ static struct sk_buff *receive_small(struct net_device *dev,
>> if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>> goto err_xdp;
>>
>> + if (unlikely(xdp_headroom != virtnet_get_headroom(vi))) {
> Should this be xdp_headroom < virtnet_get_headroom(vi)?
> Just in case we add more modes down the road.
Yes, this looks better.
>
>
>> + int offset = buf - page_address(page) + header_offset;
>> + unsigned int tlen = len + vi->hdr_len;
>> + u16 num_buf = 1;
>> +
>> + xdp_headroom = virtnet_get_headroom(vi);
>> + header_offset = VIRTNET_RX_PAD + xdp_headroom;
>> + headroom = vi->hdr_len + header_offset;
>> + buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> + xdp_page = xdp_linearize_page(rq, &num_buf, page,
>> + offset, header_offset,
>> + &tlen);
>> + if (!xdp_page)
>> + goto err_xdp;
>> +
>> + buf = page_address(xdp_page);
>> + put_page(page);
>> + page = xdp_page;
>> + }
>> +
>> xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
>> xdp.data = xdp.data_hard_start + xdp_headroom;
>> xdp.data_end = xdp.data + len;
>> @@ -462,7 +546,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>
>> skb = build_skb(buf, buflen);
>> if (!skb) {
>> - put_page(virt_to_head_page(buf));
>> + put_page(page);
>> goto err;
>> }
>> skb_reserve(skb, headroom - delta);
>> @@ -478,7 +562,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>> err_xdp:
>> rcu_read_unlock();
>> dev->stats.rx_dropped++;
>> - put_page(virt_to_head_page(buf));
>> + put_page(page);
>> xdp_xmit:
>> return NULL;
>> }
>> @@ -503,66 +587,6 @@ static struct sk_buff *receive_big(struct net_device *dev,
>> return NULL;
>> }
>>
>> -/* The conditions to enable XDP should preclude the underlying device from
>> - * sending packets across multiple buffers (num_buf > 1). However per spec
>> - * it does not appear to be illegal to do so but rather just against convention.
>> - * So in order to avoid making a system unresponsive the packets are pushed
>> - * into a page and the XDP program is run. This will be extremely slow and we
>> - * push a warning to the user to fix this as soon as possible. Fixing this may
>> - * require resolving the underlying hardware to determine why multiple buffers
>> - * are being received or simply loading the XDP program in the ingress stack
>> - * after the skb is built because there is no advantage to running it here
>> - * anymore.
>> - */
>> -static struct page *xdp_linearize_page(struct receive_queue *rq,
>> - u16 *num_buf,
>> - struct page *p,
>> - int offset,
>> - unsigned int *len)
>> -{
>> - struct page *page = alloc_page(GFP_ATOMIC);
>> - unsigned int page_off = VIRTIO_XDP_HEADROOM;
>> -
>> - if (!page)
>> - return NULL;
>> -
>> - memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
>> - page_off += *len;
>> -
>> - while (--*num_buf) {
>> - unsigned int buflen;
>> - void *buf;
>> - int off;
>> -
>> - buf = virtqueue_get_buf(rq->vq, &buflen);
>> - if (unlikely(!buf))
>> - goto err_buf;
>> -
>> - p = virt_to_head_page(buf);
>> - off = buf - page_address(p);
>> -
>> - /* guard against a misconfigured or uncooperative backend that
>> - * is sending packet larger than the MTU.
>> - */
>> - if ((page_off + buflen) > PAGE_SIZE) {
>> - put_page(p);
>> - goto err_buf;
>> - }
>> -
>> - memcpy(page_address(page) + page_off,
>> - page_address(p) + off, buflen);
>> - page_off += buflen;
>> - put_page(p);
>> - }
>> -
>> - /* Headroom does not contribute to packet length */
>> - *len = page_off - VIRTIO_XDP_HEADROOM;
>> - return page;
>> -err_buf:
>> - __free_pages(page, 0);
>> - return NULL;
>> -}
>> -
>> static struct sk_buff *receive_mergeable(struct net_device *dev,
>> struct virtnet_info *vi,
>> struct receive_queue *rq,
>> @@ -577,6 +601,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>> struct sk_buff *head_skb, *curr_skb;
>> struct bpf_prog *xdp_prog;
>> unsigned int truesize;
>> + unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>
>> head_skb = NULL;
>>
>> @@ -589,10 +614,13 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>> u32 act;
>>
>> /* This happens when rx buffer size is underestimated */
>> - if (unlikely(num_buf > 1)) {
>> + if (unlikely(num_buf > 1 ||
>> + headroom < virtnet_get_headroom(vi))) {
>> /* linearize data for XDP */
>> xdp_page = xdp_linearize_page(rq, &num_buf,
>> - page, offset, &len);
>> + page, offset,
>> + VIRTIO_XDP_HEADROOM,
>> + &len);
>> if (!xdp_page)
>> goto err_xdp;
>> offset = VIRTIO_XDP_HEADROOM;
>> @@ -830,7 +858,6 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>> err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>> if (err < 0)
>> put_page(virt_to_head_page(buf));
>> -
>> return err;
>> }
>>
>> @@ -1834,7 +1861,6 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>> }
>>
>> static int init_vqs(struct virtnet_info *vi);
>> -static void _remove_vq_common(struct virtnet_info *vi);
>>
>> static int virtnet_restore_up(struct virtio_device *vdev)
>> {
>> @@ -1863,39 +1889,6 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>> return err;
>> }
>>
>> -static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp)
>> -{
>> - struct virtio_device *dev = vi->vdev;
>> - int ret;
>> -
>> - virtio_config_disable(dev);
>> - dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>> - virtnet_freeze_down(dev);
>> - _remove_vq_common(vi);
>> -
>> - virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>> -
>> - ret = virtio_finalize_features(dev);
>> - if (ret)
>> - goto err;
>> -
>> - vi->xdp_queue_pairs = xdp_qp;
>> - ret = virtnet_restore_up(dev);
>> - if (ret)
>> - goto err;
>> - ret = _virtnet_set_queues(vi, curr_qp);
>> - if (ret)
>> - goto err;
>> -
>> - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>> - virtio_config_enable(dev);
>> - return 0;
>> -err:
>> - virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>> - return ret;
>> -}
>> -
>> static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>> struct netlink_ext_ack *extack)
>> {
>> @@ -1942,35 +1935,31 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>> return PTR_ERR(prog);
>> }
>>
>> - /* Changing the headroom in buffers is a disruptive operation because
>> - * existing buffers must be flushed and reallocated. This will happen
>> - * when a xdp program is initially added or xdp is disabled by removing
>> - * the xdp program resulting in number of XDP queues changing.
>> + /* synchronize with NAPI which may do XDP_TX based on queue
>> + * pair numbers.
> I think you mean
>
> /* Make sure NAPI is not using any XDP TX queues for RX. */
>
> is that it?
Yes.
>
>> - if (vi->xdp_queue_pairs != xdp_qp) {
>> - err = virtnet_reset(vi, curr_qp + xdp_qp, xdp_qp);
>> - if (err) {
>> - dev_warn(&dev->dev, "XDP reset failure.\n");
>> - goto virtio_reset_err;
>> - }
>> - }
>> + for (i = 0; i < vi->max_queue_pairs; i++)
>> + napi_disable(&vi->rq[i].napi);
>>
> This is pretty slow if queues are busy. Should we avoid this for queues
> which aren't effected?
The problem is we attach xdp prog to all RX queues.
>
>> netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
>> + err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>> + if (err)
>> + goto err;
>> + vi->xdp_queue_pairs = xdp_qp;
>>
>> for (i = 0; i < vi->max_queue_pairs; i++) {
>> old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>> rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>> if (old_prog)
>> bpf_prog_put(old_prog);
>> + napi_enable(&vi->rq[i].napi);
> This seems racy. See comment around virtnet_napi_enable.
Right, will use virtnet_napi_enable() instead.
Thanks
>
>> }
>>
>> return 0;
>>
>> -virtio_reset_err:
>> - /* On reset error do our best to unwind XDP changes inflight and return
>> - * error up to user space for resolution. The underlying reset hung on
>> - * us so not much we can do here.
>> - */
>> +err:
>> + for (i = 0; i < vi->max_queue_pairs; i++)
>> + napi_enable(&vi->rq[i].napi);
>> if (prog)
>> bpf_prog_sub(prog, vi->max_queue_pairs - 1);
>> return err;
>> @@ -2614,15 +2603,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>> return err;
>> }
>>
>> -static void _remove_vq_common(struct virtnet_info *vi)
>> -{
>> - vi->vdev->config->reset(vi->vdev);
>> - free_unused_bufs(vi);
>> - _free_receive_bufs(vi);
>> - free_receive_page_frags(vi);
>> - virtnet_del_vqs(vi);
>> -}
>> -
>> static void remove_vq_common(struct virtnet_info *vi)
>> {
>> vi->vdev->config->reset(vi->vdev);
>> --
>> 2.7.4
Powered by blists - more mailing lists