lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aH3rRHm8rQ35MqMd@soc-5CG4396X81.clients.intel.com>
Date: Mon, 21 Jul 2025 09:24:52 +0200
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Jason Xing <kerneljasonxing@...il.com>
CC: Jakub Kicinski <kuba@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
	"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
	<przemyslaw.kitszel@...el.com>, Maciej Fijalkowski
	<maciej.fijalkowski@...el.com>, <intel-wired-lan@...ts.osuosl.org>, netdev
	<netdev@...r.kernel.org>
Subject: Re: ixgbe driver stops sending normal data when using xsk

On Sat, Jul 19, 2025 at 01:26:18PM +0800, Jason Xing wrote:
> On Fri, Jul 18, 2025 at 6:27 PM Larysa Zaremba <larysa.zaremba@...el.com> wrote:
> >
> > On Wed, Jul 16, 2025 at 11:41:42AM +0800, Jason Xing wrote:
> > > Hi all,
> > >
> > > I'm currently faced with one tough issue caused by zero copy mode in
> > > xsk with ixgbe driver loaded. The case is that if we use xdpsock to
> > > send descs, nearly at the same time normal packets from other tx
> > > queues cannot be transmitted/completed at all.
> > >
> > > Here is how I try:
> > > 1. run iperf or ping to see if the transmission is successful.
> > > 2. then run "timeout 5 ./xdpsock -i enp2s0f0 -t  -z -s 64"
> > >
> > > You will obviously find the whole machine loses connection. It can
> > > only recover as soon as the xdpsock is stopped due to timeout.
> > >
> > > I tried a lot and then traced down to this line in ixgbe driver:
> > > ixgbe_clean_tx_irq()
> > >     -> if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> > >             break;
> > > The above line always 'breaks' the sending process.
> > >
> > > I also managed to make the external ixgbe 6.15 work and it turned out
> > > to be the same issue as before.
> > >
> > > I have no idea on how to analyze further in this driver. Could someone
> > > point out a direction that I can take? Is it a known issue?
> > >
> > > Thanks,
> > > Jason
> > >
> >
> > I was able to reproduce the described behaviour, xdpsock does break the IP
> > communication. However, in my case this was not because of ixgbe not being able
> > to send, but because of queue 0 RX packets being dropped, which is the indended
> > outcome in xdpsock, even in Tx only mode.
> 
> Thanks for your feedback. It would be great if you could elaborate
> more on this. How did you spot that it's queue 0 that causes the
> problem?

If you do not specify -q parameter, xdpsock loads on the queue pair 0.

> Why is xdpsock breaking IP communication intended?

Because when a packet arrives on the AF_XDP-managed queue (0 in this case), the 
default xdpsock XDP program provided by libxdp returns XDP_REDIRECT even in 
tx-only mode, XDP_PASS for all other queues (1-39). XDP_REDIRECT results in a 
packet leaving the kernel network stack, it is now managed by the AF_XDP 
userspace program. I think it is possible to modify libxdp to return XDP_PASS 
when the socket is tx-only.

> 
> When you try i40e, you will find the connection behaves normally. Ping
> can work as usual. As I depicted before, with ixgbe driver, ping even
> doesn't work at all.

I think this is due to RSS configuration, ping packets on i40e go to another 
queue.

> 
> iperf is the one that I should not list... Because I find iperf always
> doesn't work with either of them loaded.
> 
> >
> > When I run `tcpdump -nn -e -p -i <ifname>` on the link partner, I see that the
> > ixgbe host spams ARP packets just fine.
> 
> Interesting. I managed to see the same phenomenon.
> 
> I debugged the ixgbe and saw the following code breaks the whole
> sending process:
> ixgbe_clean_tx_irq()
>      -> if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>              break;
> 
> Do you have any idea why?
>

This line checks if HW has already sent the packet, so the driver can reclaim 
resources. If the packet has not yet been sent, there is nothing for driver to 
do but wait.

> >
> > When debugging low-level stuff such as XDP, I advise you to send packets at the
> > lower level, e.g. with scapy's sendp().
> >
> > In case you have a different problem, please provide lspci card description and
> > some truncated output of the commands that you are running and the resulting
> > dmesg.
> 
> I'm not that sure if they are the same.
> 
> One of ixgbe machines that I manipulate looks like this:
> # lspci -vv | grep -i ether
> 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
>

Some device-specific quirks on older cards sometimes result in bad XDP 
behaviour, but are usually visible in dmesg.

> # dmesg -T|grep -i ixgbe
> [Fri Jul 18 16:20:29 2025] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
> [Fri Jul 18 16:20:29 2025] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
> [Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
> Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
> [Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: 32.000 Gb/s available
> PCIe bandwidth (5.0 GT/s PCIe x8 link)
> [Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: MAC: 3, PHY: 0, PBA No:
> 000000-000
> [Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: f0:98:38:1a:5d:4e
> [Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit
> Network Connection
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: Multiqueue Enabled: Rx
> Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: 32.000 Gb/s available
> PCIe bandwidth (5.0 GT/s PCIe x8 link)
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: MAC: 3, PHY: 0, PBA No:
> 000000-000
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: f0:98:38:1a:5d:4f
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: Intel(R) 10 Gigabit
> Network Connection
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.0 enp2s0f0np0: renamed from eth0
> [Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1 enp2s0f1np1: renamed from eth1
> [Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.0: registered PHC device
> on enp2s0f0np0
> [Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
> Up 1 Gbps, Flow Control: None
> [Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.1: registered PHC device
> on enp2s0f1np1
> [Sat Jul 19 13:11:30 2025] ixgbe 0000:02:00.0: removed PHC on enp2s0f0np0
> [Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
> Queue count = 48, Tx Queue count = 48 XDP Queue count = 48
> [Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0: registered PHC device
> on enp2s0f0np0
> [Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
> Up 1 Gbps, Flow Control: None
> [Sat Jul 19 13:11:34 2025] ixgbe 0000:02:00.0: removed PHC on enp2s0f0np0
> [Sat Jul 19 13:11:34 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
> Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
> [Sat Jul 19 13:11:35 2025] ixgbe 0000:02:00.0: registered PHC device
> on enp2s0f0np0
> [Sat Jul 19 13:11:35 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
> Up 1 Gbps, Flow Control: None
> 
> reproduce process:
> 1. timeout 3 ./xdpsock -i enp2s0f0np0 -t  -z -s 64
> 2. ping <another IP address>
> 
> Thanks,
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ