[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250509094233.197245-1-michal.kubiak@intel.com>
Date: Fri, 9 May 2025 11:42:30 +0200
From: Michal Kubiak <michal.kubiak@...el.com>
To: intel-wired-lan@...ts.osuosl.org
Cc: maciej.fijalkowski@...el.com,
aleksander.lobakin@...el.com,
przemyslaw.kitszel@...el.com,
dawid.osuchowski@...ux.intel.com,
jacob.e.keller@...el.com,
jbrandeburg@...udflare.com,
netdev@...r.kernel.org,
Michal Kubiak <michal.kubiak@...el.com>
Subject: [PATCH iwl-net v2 0/3] Fix XDP loading on machines with many CPUs
Hi,
Some of our customers have reported a crash problem when trying to load
the XDP program on machines with a large number of CPU cores. After
extensive debugging, it became clear that the root cause of the problem
lies in the Tx scheduler implementation, which does not seem to be able
to handle the creation of a large number of Tx queues (even though this
number does not exceed the number of available queues reported by the
FW).
This series addresses this problem.
First of all, the XDP callback should not crash even if the Tx scheduler
returns an error, so Patch #1 fixes this error handling and makes the
XDP callback fail gracefully.
Patch #2 fixes the problem where the Tx scheduler tries to create too
many nodes even though some of them have already been added to the
scheduler tree.
Finally, Patch #3 implements an improvement to the Tx scheduler tree
rebuild algorithm to add another VSI support node if it is necessary to
support all requested Tx rings.
As testing hints, I include sample failure scenarios below:
1) Number of LAN Tx/Rx queue pairs: 128
Number of requested XDP queues: >= 321 and <= 640
Error message:
Failed to set LAN Tx queue context, error: -22
2) Number of LAN Tx/Rx queue pairs: 128
Number of requested XDP queues: >= 641
Error message:
Failed VSI LAN queue config for XDP, error: -5
3) Number of LAN Tx/Rx queue pairs: 252
Number of CPUs in the system: 384
a) Load the XDP program.
b) Try to change (reduce or increase) the queue number using
the `ethtool -L` command, for example:
sudo ethtool -L <interface-name> combined 64
Error message:
Failed to set LAN Tx queue context, error: -22
Thanks,
Michal
---
v2:
- fix the bug while the `ethtool -L` command did not work while
the XDP program was running (Jesse),
- in the patch #3, add a missing extension for `ice_sched_rm_vsi_cfg()`
to remove all VSI support nodes (including extra ones),
associated with a given VSI (to fix the root cause of the problem
mentioned above),
- add a corresponding description to the commit message of
the patch #3,
- in the cover letter, add the testing hint to check the behavior
on the `ethtool -L` command.
v1: https://lore.kernel.org/netdev/20250422153659.284868-1-michal.kubiak@intel.com/T/#ma677de2cd78d27402eead1d2a41ea0e0f656bc00
Michal Kubiak (3):
ice: fix Tx scheduler error handling in XDP callback
ice: create new Tx scheduler nodes for new queues only
ice: fix rebuilding the Tx scheduler tree for large queue counts
drivers/net/ethernet/intel/ice/ice_main.c | 47 ++++--
drivers/net/ethernet/intel/ice/ice_sched.c | 187 +++++++++++++++++----
2 files changed, 187 insertions(+), 47 deletions(-)
--
2.45.2
Powered by blists - more mailing lists