netdev - Re: [PATCH v19 bpf-next 12/23] bpf: add multi-buff support to the bpf_xdp_adjust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4100A0D0-2169-4B51-A7A3-1511E879D365@redhat.com>
Date:   Tue, 07 Dec 2021 15:52:50 +0100
From:   Eelco Chaudron <echaudro@...hat.com>
To:     Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        John Fastabend <john.fastabend@...il.com>
Cc:     Lorenzo Bianconi <lorenzo@...nel.org>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        ast@...nel.org, daniel@...earbox.net, shayagr@...zon.com,
        dsahern@...nel.org, brouer@...hat.com, jasowang@...hat.com,
        alexander.duyck@...il.com, saeed@...nel.org,
        maciej.fijalkowski@...el.com, magnus.karlsson@...el.com,
        tirthendu.sarkar@...el.com, toke@...hat.com
Subject: Re: [PATCH v19 bpf-next 12/23] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API



On 6 Dec 2021, at 16:07, Lorenzo Bianconi wrote:

>> Lorenzo Bianconi wrote:
>>> From: Eelco Chaudron <echaudro@...hat.com>
>>>
>>> This change adds support for tail growing and shrinking for XDP multi-buff.
>>>
>>> When called on a multi-buffer packet with a grow request, it will work
>>> on the last fragment of the packet. So the maximum grow size is the
>>> last fragments tailroom, i.e. no new buffer will be allocated.
>>> A XDP mb capable driver is expected to set frag_size in xdp_rxq_info data
>>> structure to notify the XDP core the fragment size. frag_size set to 0 is
>>> interpreted by the XDP core as tail growing is not allowed.
>>> Introduce __xdp_rxq_info_reg utility routine to initialize frag_size field.
>>>
>>> When shrinking, it will work from the last fragment, all the way down to
>>> the base buffer depending on the shrinking size. It's important to mention
>>> that once you shrink down the fragment(s) are freed, so you can not grow
>>> again to the original size.
>>>
>>> Acked-by: Jakub Kicinski <kuba@...nel.org>
>>> Co-developed-by: Lorenzo Bianconi <lorenzo@...nel.org>
>>> Signed-off-by: Lorenzo Bianconi <lorenzo@...nel.org>
>>> Signed-off-by: Eelco Chaudron <echaudro@...hat.com>
>>> ---

<SNIP>

>>> +static int bpf_xdp_mb_shrink_tail(struct xdp_buff *xdp, int offset)
>>> +{
>>> +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
>>> +	int i, n_frags_free = 0, len_free = 0;
>>> +
>>> +	if (unlikely(offset > (int)xdp_get_buff_len(xdp) - ETH_HLEN))
>>> +		return -EINVAL;
>>> +
>>> +	for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
>>> +		skb_frag_t *frag = &sinfo->frags[i];
>>> +		int size = skb_frag_size(frag);
>>> +		int shrink = min_t(int, offset, size);
>>> +
>>> +		len_free += shrink;
>>> +		offset -= shrink;
>>> +
>>> +		if (unlikely(size == shrink)) {
>>
>> not so sure about the unlikely.
>
> I will let Eelco comment on it since he is the author of the patch.

My reasoning for adding the unlikely was as follows (assuming 4K pages, 256 headroom):

  When jumbo frames are enabled, most frames are either below 4k or at the max size of around 9100.
  Assuming the latter this is roughly 3 pages, (4096-256) (4069) (652). Meaning we can shrink 652 bytes before hitting this case, which I think might not be hit often.

But maybe you think of another use case, which might warrant the unlikely to be removed. I do not have a strong opinion to keep it.

>>> +			struct page *page = skb_frag_page(frag);
>>> +
>>> +			__xdp_return(page_address(page), &xdp->rxq->mem,
>>> +				     false, NULL);
>>> +			n_frags_free++;
>>> +		} else {
>>> +			skb_frag_size_set(frag, size - shrink);
>>

<SNIP>