[<prev] [next>] [day] [month] [year] [list]
Message-ID: <aMwQcVZeTwuk2Q8A@localhost.localdomain>
Date: Thu, 18 Sep 2025 16:00:17 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Waiman Long <llong@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Marco Crivellari <marco.crivellari@...e.com>,
Michal Hocko <mhocko@...e.com>,
Peter Zijlstra <peterz@...radead.org>, Tejun Heo <tj@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>, linux-pci@...r.kernel.org
Subject: Re: [PATCH 02/33] PCI: Protect against concurrent change of
housekeeping cpumask
Le Fri, Aug 29, 2025 at 06:01:17PM -0400, Waiman Long a écrit :
> On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
> > HK_TYPE_DOMAIN will soon integrate cpuset isolated partitions and
> > therefore be made modifyable at runtime. Synchronize against the cpumask
> > update using RCU.
> >
> > The RCU locked section includes both the housekeeping CPU target
> > election for the PCI probe work and the work enqueue.
> >
> > This way the housekeeping update side will simply need to flush the
> > pending related works after updating the housekeeping mask in order to
> > make sure that no PCI work ever executes on an isolated CPU.
> >
> > Signed-off-by: Frederic Weisbecker<frederic@...nel.org>
> > ---
> > drivers/pci/pci-driver.c | 40 +++++++++++++++++++++++++++++++---------
> > 1 file changed, 31 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 63665240ae87..cf2b83004886 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -302,9 +302,8 @@ struct drv_dev_and_id {
> > const struct pci_device_id *id;
> > };
> > -static long local_pci_probe(void *_ddi)
> > +static int local_pci_probe(struct drv_dev_and_id *ddi)
> > {
> > - struct drv_dev_and_id *ddi = _ddi;
> > struct pci_dev *pci_dev = ddi->dev;
> > struct pci_driver *pci_drv = ddi->drv;
> > struct device *dev = &pci_dev->dev;
> > @@ -338,6 +337,19 @@ static long local_pci_probe(void *_ddi)
> > return 0;
> > }
> > +struct pci_probe_arg {
> > + struct drv_dev_and_id *ddi;
> > + struct work_struct work;
> > + int ret;
> > +};
> > +
> > +static void local_pci_probe_callback(struct work_struct *work)
> > +{
> > + struct pci_probe_arg *arg = container_of(work, struct pci_probe_arg, work);
> > +
> > + arg->ret = local_pci_probe(arg->ddi);
> > +}
> > +
> > static bool pci_physfn_is_probed(struct pci_dev *dev)
> > {
> > #ifdef CONFIG_PCI_IOV
> > @@ -362,34 +374,44 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> > dev->is_probed = 1;
> > cpu_hotplug_disable();
> > -
> > /*
> > * Prevent nesting work_on_cpu() for the case where a Virtual Function
> > * device is probed from work_on_cpu() of the Physical device.
> > */
> > if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
> > pci_physfn_is_probed(dev)) {
> > - cpu = nr_cpu_ids;
> > + error = local_pci_probe(&ddi);
> > } else {
> > cpumask_var_t wq_domain_mask;
> > + struct pci_probe_arg arg = { .ddi = &ddi };
> > +
> > + INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback);
> > if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
> > error = -ENOMEM;
> > goto out;
> > }
> > +
> > + rcu_read_lock();
> > cpumask_and(wq_domain_mask,
> > housekeeping_cpumask(HK_TYPE_WQ),
> > housekeeping_cpumask(HK_TYPE_DOMAIN));
> > cpu = cpumask_any_and(cpumask_of_node(node),
> > wq_domain_mask);
> > + if (cpu < nr_cpu_ids) {
> > + schedule_work_on(cpu, &arg.work);
> > + rcu_read_unlock();
> > + flush_work(&arg.work);
> > + error = arg.ret;
> > + } else {
> > + rcu_read_unlock();
> > + error = local_pci_probe(&ddi);
> > + }
> > +
> > free_cpumask_var(wq_domain_mask);
> > + destroy_work_on_stack(&arg.work);
> > }
> > -
> > - if (cpu < nr_cpu_ids)
> > - error = work_on_cpu(cpu, local_pci_probe, &ddi);
> > - else
> > - error = local_pci_probe(&ddi);
> > out:
> > dev->is_probed = 0;
> > cpu_hotplug_enable();
>
> A question. Is the purpose of open-coding work_on_cpu() to avoid calling
> INIT_WORK_ONSTACK() and destroy_work_on_stack() in RCU read-side critical
> section? These two macro/function may call debugobjects code which I don't
> know if they are allowed inside rcu_read_lock() critical section.
>
> Cheers, Longman
No the point is that I need to keep the target selection
(housekeeping_cpumask() read) and the work queue within the same
RCU critical section so that things are synchronized that way:
CPU 0 CPU 1
----- -----
rcu_read_lock() housekeeping_update()
cpu = cpumask_any(housekeeping_cpumask(...)) housekeeping_cpumask &= ~val
queue_work_on(cpu, pci_probe_wq, work) synchronize_rcu()
rcu_read_unlock() flush_workqueue(pci_probe_wq)
flush_work(work)
And I can't include the whole work_on_cpu() within rcu_read_lock() because
flush_work() may sleep.
Also now that you mention it, I need to create that pci_probe_wq and flush it :-)
--
Frederic Weisbecker
SUSE Labs
Powered by blists - more mailing lists