lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250509094233.197245-1-michal.kubiak@intel.com>
Date: Fri,  9 May 2025 11:42:30 +0200
From: Michal Kubiak <michal.kubiak@...el.com>
To: intel-wired-lan@...ts.osuosl.org
Cc: maciej.fijalkowski@...el.com,
	aleksander.lobakin@...el.com,
	przemyslaw.kitszel@...el.com,
	dawid.osuchowski@...ux.intel.com,
	jacob.e.keller@...el.com,
	jbrandeburg@...udflare.com,
	netdev@...r.kernel.org,
	Michal Kubiak <michal.kubiak@...el.com>
Subject: [PATCH iwl-net v2 0/3] Fix XDP loading on machines with many CPUs

Hi,

Some of our customers have reported a crash problem when trying to load
the XDP program on machines with a large number of CPU cores. After
extensive debugging, it became clear that the root cause of the problem
lies in the Tx scheduler implementation, which does not seem to be able
to handle the creation of a large number of Tx queues (even though this
number does not exceed the number of available queues reported by the
FW).
This series addresses this problem.

First of all, the XDP callback should not crash even if the Tx scheduler
returns an error, so Patch #1 fixes this error handling and makes the
XDP callback fail gracefully.
Patch #2 fixes the problem where the Tx scheduler tries to create too
many nodes even though some of them have already been added to the
scheduler tree.
Finally, Patch #3 implements an improvement to the Tx scheduler tree
rebuild algorithm to add another VSI support node if it is necessary to
support all requested Tx rings.

As testing hints, I include sample failure scenarios below:
  1) Number of LAN Tx/Rx queue pairs: 128
     Number of requested XDP queues: >= 321 and <= 640
     Error message:
        Failed to set LAN Tx queue context, error: -22
  2) Number of LAN Tx/Rx queue pairs: 128
     Number of requested XDP queues: >= 641
     Error message:
        Failed VSI LAN queue config for XDP, error: -5
  3) Number of LAN Tx/Rx queue pairs: 252
     Number of CPUs in the system: 384
        a) Load the XDP program.
        b) Try to change (reduce or increase) the queue number using
           the `ethtool -L` command, for example:
                sudo ethtool -L <interface-name> combined 64
     Error message:
        Failed to set LAN Tx queue context, error: -22

Thanks,
Michal

---

v2:
  - fix the bug while the `ethtool -L` command did not work while
    the XDP program was running (Jesse),
  - in the patch #3, add a missing extension for `ice_sched_rm_vsi_cfg()`
    to  remove all VSI support nodes (including extra ones),
    associated with a given VSI (to fix the root cause of the problem
    mentioned above),
  - add a corresponding description to the commit message of
    the patch #3,
  - in the cover letter, add the testing hint to check the behavior
    on the `ethtool -L` command.

v1: https://lore.kernel.org/netdev/20250422153659.284868-1-michal.kubiak@intel.com/T/#ma677de2cd78d27402eead1d2a41ea0e0f656bc00

Michal Kubiak (3):
  ice: fix Tx scheduler error handling in XDP callback
  ice: create new Tx scheduler nodes for new queues only
  ice: fix rebuilding the Tx scheduler tree for large queue counts

 drivers/net/ethernet/intel/ice/ice_main.c  |  47 ++++--
 drivers/net/ethernet/intel/ice/ice_sched.c | 187 +++++++++++++++++----
 2 files changed, 187 insertions(+), 47 deletions(-)

-- 
2.45.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ