[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <697832e9f00c3_309510076@dwillia2-mobl4.notmuch>
Date: Mon, 26 Jan 2026 19:37:13 -0800
From: <dan.j.williams@...el.com>
To: Jinhui Guo <guojinhui.liam@...edance.com>, <dan.j.williams@...el.com>
CC: <alexanderduyck@...com>, <bhelgaas@...gle.com>, <bvanassche@....org>,
<dakr@...nel.org>, <frederic@...nel.org>, <gregkh@...uxfoundation.org>,
<guojinhui.liam@...edance.com>, <helgaas@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<rafael@...nel.org>, <tj@...nel.org>
Subject: Re: [PATCH v2 0/3] Add NUMA-node-aware synchronous probing to driver
core
Jinhui Guo wrote:
[..]
> > I like that from a global benefit perspective, but not necessarily from
> > a regression perspective. Is there a minimal fix to PCI to make its
> > current workqueue unbound, then if that goes well come back and move all
> > devices into this scheme?
>
> Hi Dan,
>
> Thank you for your time, and apologies for the delayed reply.
I would not have read an earlier reply over this weekend anyway, so no
worries.
> I understand your concerns about stability and hope for better PCI regression
> handling. However, I believe introducing NUMA-node awareness to the driver
> core's asynchronous probe path is the better solution:
>
> 1. The asynchronous path already uses async_schedule_dev() with queue_work_node()
> to bind workers to specific NUMA nodes—this causes no side effects to driver
> probing.
> 2. I initially submitted a PCI-only fix [1], but handling asynchronous probing in
> PCI driver proved difficult. Using current_is_async() works but feels fragile.
> After discussions with Bjorn and Danilo [2][3], moving the solution to driver
> core makes distinguishing async/sync probing straightforward. Testing shows
> minimal impact on synchronous probe time.
> 3. If you prefer a PCI-only approach, we could add a flag in struct device_driver
> (default false) that PCI sets during registration. This limits the new path to
> PCI devices while others retain existing behavior. The extra code is ~10 lines
> and can be removed once confidence is established.
I am open to this option. One demonstration of how this conversion can
cause odd surprises is what it does to locking assumptions. For example,
I ran into the work_on_cpu(..., local_pci_probe...) behavior with some
of the work-in-progress confidential device work [1]. I was surprised
when lockdep_assert_held() returned false in a driver probe context.
I like that buses can opt-in to this behavior vs it being forced.
Similar to how async-behavior is handled as an opt-in.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm.git/tree/drivers/base/coco.c?h=staging#n86
> 4. I'm committed to supporting this: I'll include "Fixes:" tags for any fallout
> and provide patches within a month of any report. Since the logic mirrors the
> core async helper, risk should be low—but I'll take full responsibility
> regardless.
Sounds good.
With the above change you can add:
Acked-by: Dan Williams <dan.j.williams@...el.com>
...and I may carve out some time to upgrade that to Reviewed-by on the
next posting.
Powered by blists - more mailing lists