[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250513105529.241745-1-michal.kubiak@intel.com>
Date: Tue, 13 May 2025 12:55:26 +0200
From: Michal Kubiak <michal.kubiak@...el.com>
To: intel-wired-lan@...ts.osuosl.org
Cc: maciej.fijalkowski@...el.com,
aleksander.lobakin@...el.com,
przemyslaw.kitszel@...el.com,
dawid.osuchowski@...ux.intel.com,
jacob.e.keller@...el.com,
jbrandeburg@...udflare.com,
netdev@...r.kernel.org,
Michal Kubiak <michal.kubiak@...el.com>
Subject: [PATCH iwl-net v3 0/3] Fix XDP loading on machines with many CPUs
Hi,
Some of our customers have reported a crash problem when trying to load
the XDP program on machines with a large number of CPU cores. After
extensive debugging, it became clear that the root cause of the problem
lies in the Tx scheduler implementation, which does not seem to be able
to handle the creation of a large number of Tx queues (even though this
number does not exceed the number of available queues reported by the
FW).
This series addresses this problem.
First of all, the XDP callback should not crash even if the Tx scheduler
returns an error, so Patch #1 fixes this error handling and makes the
XDP callback fail gracefully.
Patch #2 fixes the problem where the Tx scheduler tries to create too
many nodes even though some of them have already been added to the
scheduler tree.
Finally, Patch #3 implements an improvement to the Tx scheduler tree
rebuild algorithm to add another VSI support node if it is necessary to
support all requested Tx rings.
As testing hints, I include sample failure scenarios below:
1) Number of LAN Tx/Rx queue pairs: 128
Number of requested XDP queues: >= 321 and <= 640
Error message:
Failed to set LAN Tx queue context, error: -22
2) Number of LAN Tx/Rx queue pairs: 128
Number of requested XDP queues: >= 641
Error message:
Failed VSI LAN queue config for XDP, error: -5
Thanks,
Michal
---
v3:
- do not reset the children counter during removing the VSI support
node in the patch #3 (Przemek),
- fix the kdoc comment for the newly added `ice_sched_rm_vsi_subtree()`
helper functions in the patch #3.
v2:
- fix the bug while the `ethtool -L` command did not work while
the XDP program was running (Jesse),
- in the patch #3, add a missing extension for `ice_sched_rm_vsi_cfg()`
to remove all VSI support nodes (including extra ones),
associated with a given VSI (to fix the root cause of the problem
mentioned above).
- add a corresponding description to the commit message of
the patch #3.
v2: https://lore.kernel.org/netdev/20250509094233.197245-1-michal.kubiak@intel.com/
v1: https://lore.kernel.org/netdev/20250422153659.284868-1-michal.kubiak@intel.com/
Michal Kubiak (3):
ice: fix Tx scheduler error handling in XDP callback
ice: create new Tx scheduler nodes for new queues only
ice: fix rebuilding the Tx scheduler tree for large queue counts
drivers/net/ethernet/intel/ice/ice_main.c | 47 ++++--
drivers/net/ethernet/intel/ice/ice_sched.c | 181 +++++++++++++++++----
2 files changed, 181 insertions(+), 47 deletions(-)
--
2.45.2
Powered by blists - more mailing lists