netdev - Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAL+tcoAExWe1+Mj8Sg+XxaROJo+Z8ub=74MCturUNfRSSZojgg@mail.gmail.com>
Date: Fri, 28 Nov 2025 20:59:44 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org, 
	bjorn@...nel.org, magnus.karlsson@...el.com, maciej.fijalkowski@...el.com, 
	jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org, 
	daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com, 
	bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path

On Fri, Nov 28, 2025 at 4:40 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 11/28/25 2:44 AM, Jason Xing wrote:
> > On Fri, Nov 28, 2025 at 1:58 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >> On 11/27/25 1:49 PM, Jason Xing wrote:
> >>> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >>>> On 11/25/25 12:57 PM, Jason Xing wrote:
> >>>>> This patch also removes total ~4% consumption which can be observed
> >>>>> by perf:
> >>>>> |--2.97%--validate_xmit_skb
> >>>>> |          |
> >>>>> |           --1.76%--netif_skb_features
> >>>>> |                     |
> >>>>> |                      --0.65%--skb_network_protocol
> >>>>> |
> >>>>> |--1.06%--validate_xmit_xfrm
> >>>>>
> >>>>> The above result has been verfied on different NICs, like I40E. I
> >>>>> managed to see the number is going up by 4%.
> >>>>
> >>>> I must admit this delta is surprising, and does not fit my experience in
> >>>> slightly different scenarios with the plain UDP TX path.
> >>>
> >>> My take is that when the path is extremely hot, even the mathematics
> >>> calculation could cause unexpected overhead. You can see the pps is
> >>> now over 2,000,000. The reason why I say this is because I've done a
> >>> few similar tests to verify this thought.
> >>
> >> Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
> >> (spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
> >> with a single plain UDP socket.
> >
> > Interesting number that I'm not aware of. Thanks.
> >
> > But for now it's really hard for xsk (in copy mode) to reach over 2M
> > pps even with some recent optimizations applied. I wonder how you test
> > UDP? Could you share the benchmark here?
> >
> > IMHO, xsk should not be slower than a plain UDP socket. So I think it
> > should be a huge room for xsk to improve...
>
> I can agree with that. Do you have baseline UDP figures for your H/W?

No, sorry. So I'm going to figure out how to test like xdpsock. I
think netperf/iperf should be fine?

>
> >> Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
> >> luck with icache?
> >
> > Maybe. I strongly feel that I need to work on the layout of those structures.
> >>
> >> Could you please try the attached patch instead?
> >
> > Yep, and I didn't manage to see any improvement.
>
> That is unexpected. At very least that 1% due to validate_xmit_xfrm()

Ah, I finally realize why you asked xfrm. The perf graph I provided in
the log was generated on my VM a few months ago and the test that I
did today is running on the physical server. There is one common thing
on both setups that is validate_xmit_skb() introducing additional
overhead.

> should go away. Could you please share the exact perf command line you
> are using? Sometimes I see weird artifacts in perf reports that go away
> adding the ":ppp" modifier on the command line, i.e.:
>
> perf record -ag cycles:ppp <workload>

I will try this one :)

>
> >> I think you still need to call validate_xmit_skb()
> >
> > I can simplify the whole logic as much as possible that is only
> > suitable for xsk: only keeping the linear check. That is the only
> > place that xsk could run into.
> What about checksum offload? If I read correctly xsk could build
> CSUM_PARTIAL skbs, and they will need skb_csum_hwoffload_help().

Thanks for your reminder. What you said pushed me again to go through
all the details as much as I can. Apparently I missed the
xsk_skb_metadata() function as I never used it before.

>
> Generally speaking if validate_xmit_skb() takes a relevant slice of time
> for frequently generated traffic, I guess we should try to optimize it.

I agree on this since I can definitely see the overhead through
perf[1] on every machine I own.

[1] perf record -g -p <pid> -- sleep 10

>
> @Eric: if you have the data handy, do you see validate_xmit_skb() as a
> relevant cost in your UDP xmit tests?
>
> Thanks,
>
> Paolo
>