lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260107175548.1792-1-guojinhui.liam@bytedance.com>
Date: Thu,  8 Jan 2026 01:55:45 +0800
From: "Jinhui Guo" <guojinhui.liam@...edance.com>
To: <dakr@...nel.org>, <alexander.h.duyck@...ux.intel.com>, 
	<alexanderduyck@...com>, <bhelgaas@...gle.com>, <bvanassche@....org>, 
	<dan.j.williams@...el.com>, <gregkh@...uxfoundation.org>, 
	<helgaas@...nel.org>, <rafael@...nel.org>, <tj@...nel.org>
Cc: <guojinhui.liam@...edance.com>, <linux-kernel@...r.kernel.org>, 
	<linux-pci@...r.kernel.org>
Subject: [PATCH 0/3] Add NUMA-node-aware synchronous probing to driver core

Hi all,

** Overview **

This patchset introduces NUMA-node-aware synchronous probing.

Drivers can initialize and allocate memory on the device’s local
node without scattering kmalloc_node() calls throughout the code.
NUMA-aware probing was added to PCI drivers in 2005 and has
benefited them ever since.

The asynchronous probe path already supports NUMA-node-aware
probing via async_schedule_dev() in the driver core. Since NUMA
affinity is orthogonal to sync/async probing, this patchset adds
NUMA-node-aware support to the synchronous probe path.

** Background **

The idea arose from a discussion with Bjorn and Danilo about a
PCI-probe issue [1]:

when PCI devices on the same NUMA node are probed asynchronously,
pci_call_probe() calls work_on_cpu(), pins every probe worker to
the same CPU inside that node, and forces the probes to run serially.

Testing three NVMe devices on the same NUMA node of an AMD EPYC 9A64
2.4 GHz processor (all on CPU 0):

  nvme 0000:01:00.0: CPU: 0, COMM: kworker/0:1, probe cost: 53372612 ns
  nvme 0000:02:00.0: CPU: 0, COMM: kworker/0:2, probe cost: 49532941 ns
  nvme 0000:03:00.0: CPU: 0, COMM: kworker/0:3, probe cost: 47315175 ns

Since the driver core already provides NUMA-node-aware asynchronous
probing, we can extend the same capability to the synchronous probe
path. This solves the issue and lets other drivers benefit from
NUMA-local initialization as well.

[1] https://lore.kernel.org/all/20251227113326.964-1-guojinhui.liam@bytedance.com/

** Changes **

The series makes three main changes:

1. Adds helper __device_attach_driver_scan() to eliminate duplication
   between __device_attach() and __device_attach_async_helper().
2. Introduces a NUMA-node-aware execution mechanism and uses it to
   enable NUMA-local synchronous probing in __device_attach(),
   device_driver_attach(), and __driver_attach().
3. Removes the now-redundant NUMA code from the PCI driver.

** Test **

I added debug prints to nvme, mlx5, usbhid, and intel_rapl_msr and
ran tests on an AMD EPYC 9A64 system:

1. Without the patchset
   - PCI drivers (nvme, mlx5) probe sequentially on CPU 0
   - USB and platform drivers pick random CPUs in the udev worker

   nvme 0000:01:00.0: CPU: 0, COMM: kworker/0:1, cost: 54013202 ns
   nvme 0000:02:00.0: CPU: 0, COMM: kworker/0:2, cost: 53968911 ns
   nvme 0000:03:00.0: CPU: 0, COMM: kworker/0:4, cost: 48077276 ns
   
   mlx5_core 0000:41:00.0: CPU: 0, COMM: kworker/0:2 cost: 506256717 ns
   mlx5_core 0000:41:00.1: CPU: 0, COMM: kworker/0:2 cost: 514289394 ns
   
   usb 1-2.4: CPU: 163, COMM: (udev-worker), cost 854131 ns
   usb 1-2.6: CPU: 163, COMM: (udev-worker), cost 967993 ns
   
   intel_rapl_msr intel_rapl_msr.0: CPU: 61, COMM: (udev-worker), cost: 3717567 ns

2. With the patchset
   - PCI probes are spread across CPUs inside the device’s NUMA node
   - Asynchronous nvme probes are ~35 % faster; synchronous mlx5 times
     are unchanged
   - USB probe times are virtually identical
   - Platform driver (no NUMA node) falls back to the original path

   nvme 0000:01:00.0: CPU: 130, COMM: kworker/u1025:0, cost: 35074561 ns
   nvme 0000:02:00.0: CPU:   1, COMM: kworker/u1025:6, cost: 34612117 ns
   nvme 0000:03:00.0: CPU:   2, COMM: kworker/u1025:5, cost: 34802918 ns

   mlx5_core 0000:41:00.0: CPU: 128, COMM: kworker/u1025:0, cost: 506214576 ns
   mlx5_core 0000:41:00.1: CPU: 128, COMM: kworker/u1025:0, cost: 514273565 ns

   usb 1-2.4: CPU: 51, COMM: kworker/u1031:2, cost: 933581 ns
   usb 1-2.6: CPU: 51, COMM: kworker/u1031:2, cost: 957237 ns

   intel_rapl_msr intel_rapl_msr.0: CPU: 225, COMM: (udev-worker), cost: 4715967 ns

3. With the patchset, unbind/bind cycles also spread PCI probes across
   CPUs within the device’s NUMA node:

   nvme 0000:02:00.0: CPU: 1, COMM: kworker/u1025:4, cost: 37070897 ns

** Final **

Comments and suggestions are welcome.

Best Regards,
Jinhui

---
Jinhui Guo (3):
  driver core: Introduce helper function __device_attach_driver_scan()
  driver core: Add NUMA-node awareness to the synchronous probe path
  PCI: Clean up NUMA-node awareness in pci_bus_type probe

 drivers/base/dd.c        | 173 +++++++++++++++++++++++++++++++--------
 drivers/pci/pci-driver.c |  83 ++-----------------
 include/linux/pci.h      |   1 -
 3 files changed, 148 insertions(+), 109 deletions(-)

-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ