[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8d6dbe0-b213-4990-a8af-2f95d25d21be@redhat.com>
Date: Thu, 27 Nov 2025 18:58:18 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
bjorn@...nel.org, magnus.karlsson@...el.com, maciej.fijalkowski@...el.com,
jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org,
daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
On 11/27/25 1:49 PM, Jason Xing wrote:
> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@...hat.com> wrote:
>> On 11/25/25 12:57 PM, Jason Xing wrote:
>>> This patch also removes total ~4% consumption which can be observed
>>> by perf:
>>> |--2.97%--validate_xmit_skb
>>> | |
>>> | --1.76%--netif_skb_features
>>> | |
>>> | --0.65%--skb_network_protocol
>>> |
>>> |--1.06%--validate_xmit_xfrm
>>>
>>> The above result has been verfied on different NICs, like I40E. I
>>> managed to see the number is going up by 4%.
>>
>> I must admit this delta is surprising, and does not fit my experience in
>> slightly different scenarios with the plain UDP TX path.
>
> My take is that when the path is extremely hot, even the mathematics
> calculation could cause unexpected overhead. You can see the pps is
> now over 2,000,000. The reason why I say this is because I've done a
> few similar tests to verify this thought.
Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
(spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
with a single plain UDP socket.
Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
luck with icache?
Could you please try the attached patch instead?
Should not be as good as skipping the whole validation but should give
some measurable gain.
>>> [1] - analysis of the validate_xmit_skb()
>>> 1. validate_xmit_unreadable_skb()
>>> xsk doesn't initialize skb->unreadable, so the function will not free
>>> the skb.
>>> 2. validate_xmit_vlan()
>>> xsk also doesn't initialize skb->vlan_all.
>>> 3. sk_validate_xmit_skb()
>>> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
>>> sk_state, so the skb will not be validated.
>>> 4. netif_needs_gso()
>>> af_xdp doesn't support gso/tso.
>>> 5. skb_needs_linearize() && __skb_linearize()
>>> skb doesn't have frag_list as always, so skb_has_frag_list() returns
>>> false. In copy mode, skb can put more data in the frags[] that can be
>>> found in xsk_build_skb_zerocopy().
>>
>> I'm not sure parse this last sentence correctly, could you please
>> re-phrase?
>>
>> I read it as as the xsk xmit path could build skb with nr_frags > 0.
>> That in turn will need validation from
>> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
>> (lack of NETIF_F_SG), regardless of any other offload required.
>
> There are two paths where the allocation of frags happen:
> 1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
> -> shinfo->frags[i]
> 2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
>
> Neither of them touch skb->frag_list, which means frag_list is NULL.
> IIUC, there is no place where frag_list is used (which actually I
> tested). we can see skb_needs_linearize() needs to check
> skb_has_frag_list() first, so it will not proceed after seeing it
> return false.
https://elixir.bootlin.com/linux/v6.18-rc7/source/include/linux/skbuff.h#L4322
return skb_is_nonlinear(skb) &&
((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
(skb_shinfo(skb)->nr_frags && !(features & NETIF_F_SG)));
can return true even if `!skb_has_frag_list(skb)`.
I think you still need to call validate_xmit_skb()
/P
View attachment "sec_path.patch" of type "text/x-patch" (383 bytes)
Powered by blists - more mailing lists