[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff7ca6ea-a122-4d7d-9ef2-d091cbdd96d2@hetzner-cloud.de>
Date: Thu, 10 Apr 2025 16:54:35 +0200
From: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>
To: Michal Kubiak <michal.kubiak@...el.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, Jay Vosburgh
<jv@...sburgh.net>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, intel-wired-lan@...ts.osuosl.org,
netdev@...r.kernel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
sdn@...zner-cloud.de
Subject: Re: [BUG] ixgbe: Detected Tx Unit Hang (XDP)
Am 10.04.25 um 16:30 schrieb Michal Kubiak:
> On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
>> Hi,
>>
>> in a setup where I use native XDP to redirect packets to a bonding interface
>> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
>> resets the NIC with the following kernel output:
>>
>> ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
>> Tx Queue <4>
>> TDH, TDT <17e>, <17e>
>> next_to_use <181>
>> next_to_clean <17e>
>> tx_buffer_info[next_to_clean]
>> time_stamp <0>
>> jiffies <10025c380>
>> ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting adapter
>> ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
>> ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
>>
>> This only occurs in combination with a bonding interface and XDP, so I don't
>> know if this is an issue with ixgbe or the bonding driver.
>> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
>> show the same issue.
>>
>>
>> I managed to reproduce this bug in a lab environment. Here are some details
>> about my setup and the steps to reproduce the bug:
>>
>> [...]
>>
>> Do you have any ideas what may be causing this issue or what I can do to
>> diagnose this further?
>>
>> Please let me know when I should provide any more information.
>>
>>
>> Thanks!
>> Marcus
>>
>
> Hi Marcus,
Hi Michal,
thank you for looking into it. And not even 24 hours after my report, I'm
very impressed! ;)
> I have just successfully reproduced the problem on our lab machine. What
> is interesting is that I do not seem to have to use a bonding interface
> to get the "Tx timeout" that causes the adapter to reset.
Interesting. I just tried again but had no luck yet with reproducing it
without a bonding interface. May I ask how your setup looks like?
> I will try to debug the problem more closely and let you know of any
> updates.
>
> Thanks,
> Michal
Great!
Marcus
Powered by blists - more mailing lists