lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCiL0jjUO8RPiWX-+9VtjQm50ZeM5MQXn3Q6m+yNYryzQ@mail.gmail.com>
Date: Tue, 15 Jul 2025 07:53:19 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org, 
	john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com, 
	bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] xsk: skip validating skb list in xmit path

On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 07/13, Jason Xing wrote:
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > For xsk, it's not needed to validate and check the skb in
> > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> > and doesn't need to prepare those requisites to validate. Xsk is just
> > responsible for delivering raw data from userspace to the driver.
>
> So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
> ("dev: packet: make packet_direct_xmit a common function"). And a call
> to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
> limit tso and csum to supported devices") to support TSO. Since we don't
> support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
> seems fair.

Right, if you don't mind, I think I will copy&paste your words in the
next respin.

>
> Although, again, if you care about performance, why not use zerocopy
> mode?

I attached the performance impact because I'm working on the different
modes in xsk to see how it really behaves. You can take it as a kind
of investigation :)

I like zc mode, but the fact is that:
1) with ixgbe driver, my machine could totally lose connection as long
as the xsk tries to send packets. I'm still debugging it :(
2) some customers using virtio_net don't have a supported host, so
copy mode is the only one choice.

>
> > Skipping numerous checks somehow contributes to the transmission
> > especially in the extremely hot path.
> >
> > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t  -S -s 64' to verify
> > the guess and then measured on the machine with ixgbe driver. It stably
> > goes up by 5.48%, which can be seen in the shown below:
> > Before:
> >  sock0@...2s0f0np0:0 txonly xdp-skb
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 1,187,410      3,513,536
> > After:
> >  sock0@...2s0f0np0:0 txonly xdp-skb
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 1,252,590      2,459,456
> >
> > This patch also removes total ~4% consumption which can be observed
> > by perf:
> > |--2.97%--validate_xmit_skb
> > |          |
> > |           --1.76%--netif_skb_features
> > |                     |
> > |                      --0.65%--skb_network_protocol
> > |
> > |--1.06%--validate_xmit_xfrm
>
> It is a bit surprising that mostly no-op validate_xmit_skb_list takes
> 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
> it unoptimized kernel? Which driver is it?

No idea on this one, sorry. I tested with different drivers (like
i40e) and it turned out to be nearly the same result.

One of my machines looks like this:
# lspci -vv | grep -i ether
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
# lscpu
Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             46 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      48
  On-line CPU(s) list:       0-47
Vendor ID:                   GenuineIntel
  BIOS Vendor ID:            Intel(R) Corporation
  Model name:                Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
    BIOS Model name:         Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
 CPU @ 2.3GHz
    BIOS CPU family:         179
    CPU family:              6
    Model:                   63
    Thread(s) per core:      2
    Core(s) per socket:      12
    Socket(s):               2
    Stepping:                2
    CPU(s) scaling MHz:      96%
    CPU max MHz:             3100.0000
    CPU min MHz:             1200.0000
    BogoMIPS:                4589.31
    Flags:                   fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts
                             c arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
                             cid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault
epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl
                             expriority ept vpid ept_ad fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l
                             1d

>
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> >  include/linux/netdevice.h |  4 ++--
> >  net/core/dev.c            | 10 ++++++----
> >  net/xdp/xsk.c             |  2 +-
> >  3 files changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index a80d21a14612..2df44c22406c 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> >                    struct net_device *sb_dev);
> >
> >  int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
> >
> >  static inline int dev_queue_xmit(struct sk_buff *skb)
> >  {
> > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> >  {
> >       int ret;
> >
> > -     ret = __dev_direct_xmit(skb, queue_id);
> > +     ret = __dev_direct_xmit(skb, queue_id, true);
> >       if (!dev_xmit_complete(ret))
> >               kfree_skb(skb);
> >       return ret;
>
> Implementation wise, will it be better if we move a call to validate_xmit_skb_list
> from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
> __dev_direct_xmit)? This will avoid the new flag.

__dev_direct_xmit() helper was developed to serve the xsk type a few
years ago. For now it has only two callers. If we expect to avoid a
new parameter, we will also move the dev check[1] as below to the
callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to
__dev_direct_xmit(). It's not that concise, I assume? I'm not sure if
I miss your point.

[1]
if (unlikely(!netif_running(dev) ||  !netif_carrier_ok(dev)))
        goto drop;

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ