netdev - Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ef4ca8d3-3127-f6dd-032a-e04d367fd49c@redhat.com>
Date: Tue, 22 Aug 2023 15:05:45 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Yunsheng Lin <linyunsheng@...wei.com>,
 Jesper Dangaard Brouer <jbrouer@...hat.com>,
 Liang Chen <liangchen.linux@...il.com>, hawk@...nel.org, horms@...nel.org,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com
Cc: brouer@...hat.com, ilias.apalodimas@...aro.org, daniel@...earbox.net,
 ast@...nel.org, netdev@...r.kernel.org,
 Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
 Stanislav Fomichev <sdf@...gle.com>, Maryam Tahhan <mtahhan@...hat.com>
Subject: Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage



On 22/08/2023 14.24, Yunsheng Lin wrote:
> On 2023/8/22 5:54, Jesper Dangaard Brouer wrote:
>> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
>>>
>>> On 16/08/2023 14.30, Liang Chen wrote:
>>>> Page pool is supported for veth, but at the moment pages are not properly
>>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>>>> leveraging the advantages of the page pool. So this RFC patchset is mainly
>>>> to make recycling work for those cases. With that in place, it can be
>>>> further optimized by utilizing the napi skb cache. Detailed figures are
>>>> presented in each commit message, and together they demonstrate a quite
>>>> noticeable improvement.
>>>>
>>>
>>> I'm digging into this code path today.
>>>
>>> I'm trying to extend this and find a way to support SKBs that used
>>> kmalloc (skb->head_frag=0), such that we can remove the
>>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
>>> allow more SKBs to avoid realloc.  As long as they have enough headroom,
>>> which we can dynamically control for netdev TX-packets by adjusting
>>> netdev->needed_headroom, e.g. when loading an XDP prog.
>>>
>>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
>>> handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
>>> think it is a bug that generic-XDP allows this.
> 
> Is it possible to relaxe other checking too, and implement something like
> pskb_expand_head() in xdp core if xdp core need to modify the data?
> 

Yes, I definitely hope (and plan) to relax other checks.

The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and
wrong, as many drivers today use headroom 192 bytes for XDP (which we
allowed).  Thus, there is not reason for veth to insist on this
XDP_PACKET_HEADROOM limit.  Today XDP can handle variable headroom (due
to these drivers).


> 
>>>
>>> Deep into this rabbit hole, I start to question our approach.
>>>    - Perhaps the veth XDP approach for SKBs is wrong?
>>>
>>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
>>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
>>> question, why isn't it using "xdp_do_generic_redirect()"?
>>> (I will jump into this rabbit hole now...)
> 
> Is there any reason why xdp_do_redirect() can not handle the slab-allocated
> data? Can we change the xdp_do_redirect() to handle slab-allocated
> data, so that it can benefit other case beside veth too?
> 

I started coding up this, but realized that it was a wrong approach.

The xdp_do_redirect() call is for native-XDP with a proper xdp_buff.
When dealing with SKBs we pretend is a xdp_buff, we have the API
xdp_do_generic_redirect().  IMHO it is wrong to "steal" the packet-data
from an SKB and in-order to use the native-XDP API xdp_do_redirect().
In the use-cases I see, often the next layer will allocate a new SKB and
attach the stolen packet-data , which is pure-waste as
xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs.

--Jesper