[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAY5uaYRC2EyMQTn+Hjb62KKD1DRyymW+M27BT=n+MUOw@mail.gmail.com>
Date: Fri, 28 Nov 2025 09:44:49 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
bjorn@...nel.org, magnus.karlsson@...el.com, maciej.fijalkowski@...el.com,
jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org,
daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
On Fri, Nov 28, 2025 at 1:58 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 11/27/25 1:49 PM, Jason Xing wrote:
> > On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >> On 11/25/25 12:57 PM, Jason Xing wrote:
> >>> This patch also removes total ~4% consumption which can be observed
> >>> by perf:
> >>> |--2.97%--validate_xmit_skb
> >>> | |
> >>> | --1.76%--netif_skb_features
> >>> | |
> >>> | --0.65%--skb_network_protocol
> >>> |
> >>> |--1.06%--validate_xmit_xfrm
> >>>
> >>> The above result has been verfied on different NICs, like I40E. I
> >>> managed to see the number is going up by 4%.
> >>
> >> I must admit this delta is surprising, and does not fit my experience in
> >> slightly different scenarios with the plain UDP TX path.
> >
> > My take is that when the path is extremely hot, even the mathematics
> > calculation could cause unexpected overhead. You can see the pps is
> > now over 2,000,000. The reason why I say this is because I've done a
> > few similar tests to verify this thought.
>
> Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
> (spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
> with a single plain UDP socket.
Interesting number that I'm not aware of. Thanks.
But for now it's really hard for xsk (in copy mode) to reach over 2M
pps even with some recent optimizations applied. I wonder how you test
UDP? Could you share the benchmark here?
IMHO, xsk should not be slower than a plain UDP socket. So I think it
should be a huge room for xsk to improve...
>
> Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
> luck with icache?
Maybe. I strongly feel that I need to work on the layout of those structures.
>
> Could you please try the attached patch instead?
Yep, and I didn't manage to see any improvement.
>
> Should not be as good as skipping the whole validation but should give
> some measurable gain.
> >>> [1] - analysis of the validate_xmit_skb()
> >>> 1. validate_xmit_unreadable_skb()
> >>> xsk doesn't initialize skb->unreadable, so the function will not free
> >>> the skb.
> >>> 2. validate_xmit_vlan()
> >>> xsk also doesn't initialize skb->vlan_all.
> >>> 3. sk_validate_xmit_skb()
> >>> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
> >>> sk_state, so the skb will not be validated.
> >>> 4. netif_needs_gso()
> >>> af_xdp doesn't support gso/tso.
> >>> 5. skb_needs_linearize() && __skb_linearize()
> >>> skb doesn't have frag_list as always, so skb_has_frag_list() returns
> >>> false. In copy mode, skb can put more data in the frags[] that can be
> >>> found in xsk_build_skb_zerocopy().
> >>
> >> I'm not sure parse this last sentence correctly, could you please
> >> re-phrase?
> >>
> >> I read it as as the xsk xmit path could build skb with nr_frags > 0.
> >> That in turn will need validation from
> >> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
> >> (lack of NETIF_F_SG), regardless of any other offload required.
> >
> > There are two paths where the allocation of frags happen:
> > 1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
> > -> shinfo->frags[i]
> > 2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
> >
> > Neither of them touch skb->frag_list, which means frag_list is NULL.
> > IIUC, there is no place where frag_list is used (which actually I
> > tested). we can see skb_needs_linearize() needs to check
> > skb_has_frag_list() first, so it will not proceed after seeing it
> > return false.
> https://elixir.bootlin.com/linux/v6.18-rc7/source/include/linux/skbuff.h#L4322
>
> return skb_is_nonlinear(skb) &&
> ((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
> (skb_shinfo(skb)->nr_frags && !(features & NETIF_F_SG)));
>
> can return true even if `!skb_has_frag_list(skb)`.
Oh well, indeed, I missed the nr_frags condition.
> I think you still need to call validate_xmit_skb()
I can simplify the whole logic as much as possible that is only
suitable for xsk: only keeping the linear check. That is the only
place that xsk could run into.
Thanks,
Jason
Powered by blists - more mailing lists