[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoCJ9ghWVQ1afD_WJmx-3n+80Th7jPw-N-k9Z6ZjJErSkw@mail.gmail.com>
Date: Sat, 19 Jul 2025 13:26:18 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Larysa Zaremba <larysa.zaremba@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, przemyslaw.kitszel@...el.com,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>, intel-wired-lan@...ts.osuosl.org,
netdev <netdev@...r.kernel.org>
Subject: Re: ixgbe driver stops sending normal data when using xsk
On Fri, Jul 18, 2025 at 6:27 PM Larysa Zaremba <larysa.zaremba@...el.com> wrote:
>
> On Wed, Jul 16, 2025 at 11:41:42AM +0800, Jason Xing wrote:
> > Hi all,
> >
> > I'm currently faced with one tough issue caused by zero copy mode in
> > xsk with ixgbe driver loaded. The case is that if we use xdpsock to
> > send descs, nearly at the same time normal packets from other tx
> > queues cannot be transmitted/completed at all.
> >
> > Here is how I try:
> > 1. run iperf or ping to see if the transmission is successful.
> > 2. then run "timeout 5 ./xdpsock -i enp2s0f0 -t -z -s 64"
> >
> > You will obviously find the whole machine loses connection. It can
> > only recover as soon as the xdpsock is stopped due to timeout.
> >
> > I tried a lot and then traced down to this line in ixgbe driver:
> > ixgbe_clean_tx_irq()
> > -> if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> > break;
> > The above line always 'breaks' the sending process.
> >
> > I also managed to make the external ixgbe 6.15 work and it turned out
> > to be the same issue as before.
> >
> > I have no idea on how to analyze further in this driver. Could someone
> > point out a direction that I can take? Is it a known issue?
> >
> > Thanks,
> > Jason
> >
>
> I was able to reproduce the described behaviour, xdpsock does break the IP
> communication. However, in my case this was not because of ixgbe not being able
> to send, but because of queue 0 RX packets being dropped, which is the indended
> outcome in xdpsock, even in Tx only mode.
Thanks for your feedback. It would be great if you could elaborate
more on this. How did you spot that it's queue 0 that causes the
problem? Why is xdpsock breaking IP communication intended?
When you try i40e, you will find the connection behaves normally. Ping
can work as usual. As I depicted before, with ixgbe driver, ping even
doesn't work at all.
iperf is the one that I should not list... Because I find iperf always
doesn't work with either of them loaded.
>
> When I run `tcpdump -nn -e -p -i <ifname>` on the link partner, I see that the
> ixgbe host spams ARP packets just fine.
Interesting. I managed to see the same phenomenon.
I debugged the ixgbe and saw the following code breaks the whole
sending process:
ixgbe_clean_tx_irq()
-> if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
break;
Do you have any idea why?
>
> When debugging low-level stuff such as XDP, I advise you to send packets at the
> lower level, e.g. with scapy's sendp().
>
> In case you have a different problem, please provide lspci card description and
> some truncated output of the commands that you are running and the resulting
> dmesg.
I'm not that sure if they are the same.
One of ixgbe machines that I manipulate looks like this:
# lspci -vv | grep -i ether
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
# dmesg -T|grep -i ixgbe
[Fri Jul 18 16:20:29 2025] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[Fri Jul 18 16:20:29 2025] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
[Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: 32.000 Gb/s available
PCIe bandwidth (5.0 GT/s PCIe x8 link)
[Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: MAC: 3, PHY: 0, PBA No:
000000-000
[Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: f0:98:38:1a:5d:4e
[Fri Jul 18 16:20:29 2025] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit
Network Connection
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: Multiqueue Enabled: Rx
Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: 32.000 Gb/s available
PCIe bandwidth (5.0 GT/s PCIe x8 link)
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: MAC: 3, PHY: 0, PBA No:
000000-000
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: f0:98:38:1a:5d:4f
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1: Intel(R) 10 Gigabit
Network Connection
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.0 enp2s0f0np0: renamed from eth0
[Fri Jul 18 16:20:30 2025] ixgbe 0000:02:00.1 enp2s0f1np1: renamed from eth1
[Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.0: registered PHC device
on enp2s0f0np0
[Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
Up 1 Gbps, Flow Control: None
[Fri Jul 18 16:20:38 2025] ixgbe 0000:02:00.1: registered PHC device
on enp2s0f1np1
[Sat Jul 19 13:11:30 2025] ixgbe 0000:02:00.0: removed PHC on enp2s0f0np0
[Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
Queue count = 48, Tx Queue count = 48 XDP Queue count = 48
[Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0: registered PHC device
on enp2s0f0np0
[Sat Jul 19 13:11:31 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
Up 1 Gbps, Flow Control: None
[Sat Jul 19 13:11:34 2025] ixgbe 0000:02:00.0: removed PHC on enp2s0f0np0
[Sat Jul 19 13:11:34 2025] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx
Queue count = 48, Tx Queue count = 48 XDP Queue count = 0
[Sat Jul 19 13:11:35 2025] ixgbe 0000:02:00.0: registered PHC device
on enp2s0f0np0
[Sat Jul 19 13:11:35 2025] ixgbe 0000:02:00.0 enp2s0f0np0: NIC Link is
Up 1 Gbps, Flow Control: None
reproduce process:
1. timeout 3 ./xdpsock -i enp2s0f0np0 -t -z -s 64
2. ping <another IP address>
Thanks,
Jason
Powered by blists - more mailing lists