[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260126091749.307-1-guojinhui.liam@bytedance.com>
Date: Mon, 26 Jan 2026 17:17:49 +0800
From: "Jinhui Guo" <guojinhui.liam@...edance.com>
To: <dan.j.williams@...el.com>
Cc: <alexanderduyck@...com>, <bhelgaas@...gle.com>, <bvanassche@....org>,
<dakr@...nel.org>, <frederic@...nel.org>, <gregkh@...uxfoundation.org>,
<guojinhui.liam@...edance.com>, <helgaas@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<rafael@...nel.org>, <tj@...nel.org>
Subject: Re: [PATCH v2 0/3] Add NUMA-node-aware synchronous probing to driver core
On Fri Jan 23, 2026 17:04:27 -0800, Dan Williams wrote:
> Jinhui Guo wrote:
> > Hi all,
> >
> > ** Overview **
> >
> > This patchset introduces NUMA-node-aware synchronous probing.
> >
> > Drivers can initialize and allocate memory on the device’s local
> > node without scattering kmalloc_node() calls throughout the code.
> > NUMA-aware probing was added to PCI drivers in 2005 and has
> > benefited them ever since.
> >
> > The asynchronous probe path already supports NUMA-node-aware
> > probing via async_schedule_dev() in the driver core. Since NUMA
> > affinity is orthogonal to sync/async probing, this patchset adds
> > NUMA-node-aware support to the synchronous probe path.
> >
> > ** Background **
> >
> > The idea arose from a discussion with Bjorn and Danilo about a
> > PCI-probe issue [1]:
> >
> > when PCI devices on the same NUMA node are probed asynchronously,
> > pci_call_probe() calls work_on_cpu(), pins every probe worker to
> > the same CPU inside that node, and forces the probes to run serially.
> >
> > Testing three NVMe devices on the same NUMA node of an AMD EPYC 9A64
> > 2.4 GHz processor (all on CPU 0):
> >
> > nvme 0000:01:00.0: CPU: 0, COMM: kworker/0:1, probe cost: 53372612 ns
> > nvme 0000:02:00.0: CPU: 0, COMM: kworker/0:2, probe cost: 49532941 ns
> > nvme 0000:03:00.0: CPU: 0, COMM: kworker/0:3, probe cost: 47315175 ns
> >
> > Since the driver core already provides NUMA-node-aware asynchronous
> > probing, we can extend the same capability to the synchronous probe
> > path. This solves the issue and lets other drivers benefit from
> > NUMA-local initialization as well.
>
> I like that from a global benefit perspective, but not necessarily from
> a regression perspective. Is there a minimal fix to PCI to make its
> current workqueue unbound, then if that goes well come back and move all
> devices into this scheme?
Hi Dan,
Thank you for your time, and apologies for the delayed reply.
I understand your concerns about stability and hope for better PCI regression
handling. However, I believe introducing NUMA-node awareness to the driver
core's asynchronous probe path is the better solution:
1. The asynchronous path already uses async_schedule_dev() with queue_work_node()
to bind workers to specific NUMA nodes—this causes no side effects to driver
probing.
2. I initially submitted a PCI-only fix [1], but handling asynchronous probing in
PCI driver proved difficult. Using current_is_async() works but feels fragile.
After discussions with Bjorn and Danilo [2][3], moving the solution to driver
core makes distinguishing async/sync probing straightforward. Testing shows
minimal impact on synchronous probe time.
3. If you prefer a PCI-only approach, we could add a flag in struct device_driver
(default false) that PCI sets during registration. This limits the new path to
PCI devices while others retain existing behavior. The extra code is ~10 lines
and can be removed once confidence is established.
4. I'm committed to supporting this: I'll include "Fixes:" tags for any fallout
and provide patches within a month of any report. Since the logic mirrors the
core async helper, risk should be low—but I'll take full responsibility
regardless.
Please let me know if you have other concerns.
[1] https://lore.kernel.org/all/20251230142736.1168-1-guojinhui.liam@bytedance.com/
[2] https://lore.kernel.org/all/20251231165503.GA159243@bhelgaas/
[3] https://lore.kernel.org/all/DFFXIZR1AGTV.2WZ1G2JAU0HFQ@kernel.org/
Best Regards,
Jinhui
Powered by blists - more mailing lists