[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260108082815.1876-1-guojinhui.liam@bytedance.com>
Date: Thu, 8 Jan 2026 16:28:15 +0800
From: "Jinhui Guo" <guojinhui.liam@...edance.com>
To: <dakr@...nel.org>
Cc: <alexander.h.duyck@...ux.intel.com>, <alexanderduyck@...com>,
<bhelgaas@...gle.com>, <bvanassche@....org>, <dan.j.williams@...el.com>,
<gregkh@...uxfoundation.org>, <guojinhui.liam@...edance.com>,
<helgaas@...nel.org>, <linux-kernel@...r.kernel.org>,
<linux-pci@...r.kernel.org>, <rafael@...nel.org>, <tj@...nel.org>
Subject: Re: [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
On Wed Jan 07, 2026 at 19:22:15 +0100, Danilo Krummrich wrote:
> On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> > + * __exec_on_numa_node - Execute a function on a specific NUMA node synchronously
> > + * @node: Target NUMA node ID
> > + * @func: The wrapper function to execute
> > + * @arg1: First argument (void *)
> > + * @arg2: Second argument (void *)
> > + *
> > + * Returns the result of the function execution, or -ENODEV if initialization fails.
> > + * If the node is invalid or offline, it falls back to local execution.
> > + */
> > +static int __exec_on_numa_node(int node, numa_func_t func, void *arg1, void *arg2)
> > +{
> > + struct numa_work_ctx ctx;
> > +
> > + /* Fallback to local execution if the node is invalid or offline */
> > + if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
> > + return func(arg1, arg2);
>
> Just a quick drive-by comment (I’ll go through it more thoroughly later).
>
> What about the case where we are already on the requested node?
>
> Also, we should probably set the corresponding CPU affinity for the time we are
> executing func() to prevent migration.
Hi Danilo,
Thank you for your time and helpful comments.
Relying on queue_work_node() for node affinity is safer, even if the thread
is already on the target CPU.
Checking the current CPU and then setting affinity ourselves would require
handling CPU-hotplug and isolated CPUs—corner cases that become complex
quickly.
The PCI driver tried this years ago and ran into numerous problems; delegating
the decision to queue_work_node() avoids repeating that history.
- Commit d42c69972b85 ("[PATCH] PCI: Run PCI driver initialization on local node")
first added NUMA awareness with set_cpus_allowed_ptr().
- Commit 1ddd45f8d76f ("PCI: Use cpu_hotplug_disable() instead of get_online_cpus()")
handled CPU-hotplug.
- Commits 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs") and
9d42ea0d6984 ("pci: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch") dealt
with isolated CPUs.
I considered setting CPU affinity, but the performance gain is minimal:
1. Driver probing happens mainly at boot, when load is light, so queuing a worker
incurs little delay.
2. With many devices they are usually spread across nodes, so workers are not
stalled long within any NUMA node.
3. Even after pinning, tasks can still be migrated by load balancing within the
NUMA node, so the reduction in context switches versus using queue_work_node()
alone is negligible.
Test data [1] shows that queue_work_node() has negligible impact on synchronous probe time.
[1] https://lore.kernel.org/all/20260107175548.1792-1-guojinhui.liam@bytedance.com/
If you have any other concerns, please let me know.
Best Regards,
Jinhui
Powered by blists - more mailing lists