netdev - Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abeb882e-0b44-12d6-d30d-d037c0813941@redhat.com>
Date:   Tue, 7 Feb 2017 10:23:23 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     John Fastabend <john.fastabend@...il.com>, kubakici@...pl,
        ast@...com, mst@...hat.com
Cc:     john.r.fastabend@...el.com, netdev@...r.kernel.org
Subject: Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head



On 2017年02月07日 03:29, John Fastabend wrote:
> On 17-02-05 11:08 PM, Jason Wang wrote:
>>
>> On 2017年02月03日 11:16, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> At the moment this code must do a full reset to ensure old buffers
>>> without headroom on program add or with headroom on program removal
>>> are not used incorrectly in the datapath. Ideally we would only
>>> have to disable/enable the RX queues being updated but there is no
>>> API to do this at the moment in virtio so use the big hammer. In
>>> practice it is likely not that big of a problem as this will only
>>> happen when XDP is enabled/disabled changing programs does not
>>> require the reset. There is some risk that the driver may either
>>> have an allocation failure or for some reason fail to correctly
>>> negotiate with the underlying backend in this case the driver will
>>> be left uninitialized. I have not seen this ever happen on my test
>>> systems and for what its worth this same failure case can occur
>>> from probe and other contexts in virtio framework.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
>>> ---
>
> [...]
>
>>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        struct bpf_prog *xdp_prog;
>>>          len -= vi->hdr_len;
>>> -    skb_trim(skb, len);
>>>          rcu_read_lock();
>>>        xdp_prog = rcu_dereference(rq->xdp_prog);
>>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>>> *dev,
>>>            if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>>                goto err_xdp;
>>>    -        xdp.data = skb->data;
>>> +        xdp.data_hard_start = skb->data;
>>> +        xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>>>            xdp.data_end = xdp.data + len;
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* Recalculate length in case bpf program changed it */
>>> +            __skb_pull(skb, xdp.data - xdp.data_hard_start);
>> But skb->len were trimmed to len below which seems wrong.
> I believe this is correct and it passes my basic iperf/ping tests.
>
> When we are using small buffers with XDP, skb->data is pointing to the front
> of the buffer. This space includes the XDP headroom. When we pass the skb up
> to the stack we need to pull this off and point to the start of the data. But
> there still is likely a bunch of room at the end of the buffer assuming the
> packet is smaller than the buffer side.
>
>>> +            len = xdp.data_end - xdp.data;
>>>                break;
>>>            case XDP_TX:
>>>                if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        }
>>>        rcu_read_unlock();
>>>    +    skb_trim(skb, len);
> So here we trim the packet to set the length to the actual payload size. The
> 'len' parameter passed into receive_small does not include the headroom so this
> gives us the correct length of the payload.
>
> Make sense?

Yes, you are right.

>
>>>        return skb;
>>>      err_xdp:
> [...]
>
>>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>>> *dev,
>>>                                  page, offset, &len);
>>>                if (!xdp_page)
>>>                    goto err_xdp;
>>> -            offset = 0;
>>> +            offset = VIRTIO_XDP_HEADROOM;
>>>            } else {
>>>                xdp_page = page;
>>>            }
>>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>>> net_device *dev,
>>>            if (unlikely(hdr->hdr.gso_type))
>>>                goto err_xdp;
>>>    +        /* Allow consuming headroom but reserve enough space to push
>>> +         * the descriptor on if we get an XDP_TX return code.
>>> +         */
>>>            data = page_address(xdp_page) + offset;
>>> +        xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>> Should be data - VIRTIO_XDP_HEADROOM I think?
>>
> If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
> we reserve enough headroom to push the header onto the buffer when the packet
> is sent. So the additional hdr_len reserve here is intentional. Otherwise we
> would need to detect this and do some type of linearize action.

I get the point.

>
>>>            xdp.data = data + vi->hdr_len;
>>>            xdp.data_end = xdp.data + (len - vi->hdr_len);
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* recalculate offset to account for any header
>>> +             * adjustments. Note other cases do not build an
>>> +             * skb and avoid using offset
>>> +             */
>>> +            offset = xdp.data -
>>> +                    page_address(xdp_page) - vi->hdr_len;
>>> +
>>>                /* We can only create skb based on xdp_page. */
>>>                if (unlikely(xdp_page != page)) {
>>>                    rcu_read_unlock();
>>>                    put_page(page);
>>>                    head_skb = page_to_skb(vi, rq, xdp_page,
>>> -                               0, len, PAGE_SIZE);
>>> +                               offset, len, PAGE_SIZE);
>>>                    ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>>                    return head_skb;
> [...]
>
>>>    -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>> +                 struct receive_queue *rq, gfp_t gfp)
>>>    {
>>>        struct page_frag *alloc_frag = &rq->alloc_frag;
>>> +    unsigned int headroom = virtnet_get_headroom(vi);
>>>        char *buf;
>>>        unsigned long ctx;
>>>        int err;
>>>        unsigned int len, hole;
>>>          len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>>> -    if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>>> +    if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>>            return -ENOMEM;
>>>          buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>>> +    buf += headroom; /* advance address leaving hole at front of pkt */
>> Note: the headroom will reduce the possibility of frag coalescing which may
>> damage the performance more or less.
>>
>> [...]
> Right there are a few other performance optimizations I am looking at in
> virtio as well but these should go in as follow on series.
>
> Specifically, I'm looking at recycling buffers to see what sort of performance
> increase we can get out of that. Many of the hardware drivers do this and see
> a performance boost from it. However dynamic buffer sizes like this make it a
> bit challenging.
>
> .John

Right.

Acked-by: Jason Wang <jasowang@...hat.com>

Thanks