linux-kernel - Re: [PATCH 02/33] PCI: Protect against concurrent change of housekeeping cpumask

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMwQcVZeTwuk2Q8A@localhost.localdomain>
Date: Thu, 18 Sep 2025 16:00:17 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Waiman Long <llong@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Marco Crivellari <marco.crivellari@...e.com>,
	Michal Hocko <mhocko@...e.com>,
	Peter Zijlstra <peterz@...radead.org>, Tejun Heo <tj@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vlastimil Babka <vbabka@...e.cz>, linux-pci@...r.kernel.org
Subject: Re: [PATCH 02/33] PCI: Protect against concurrent change of
 housekeeping cpumask

Le Fri, Aug 29, 2025 at 06:01:17PM -0400, Waiman Long a écrit :
> On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
> > HK_TYPE_DOMAIN will soon integrate cpuset isolated partitions and
> > therefore be made modifyable at runtime. Synchronize against the cpumask
> > update using RCU.
> > 
> > The RCU locked section includes both the housekeeping CPU target
> > election for the PCI probe work and the work enqueue.
> > 
> > This way the housekeeping update side will simply need to flush the
> > pending related works after updating the housekeeping mask in order to
> > make sure that no PCI work ever executes on an isolated CPU.
> > 
> > Signed-off-by: Frederic Weisbecker<frederic@...nel.org>
> > ---
> >   drivers/pci/pci-driver.c | 40 +++++++++++++++++++++++++++++++---------
> >   1 file changed, 31 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 63665240ae87..cf2b83004886 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -302,9 +302,8 @@ struct drv_dev_and_id {
> >   	const struct pci_device_id *id;
> >   };
> > -static long local_pci_probe(void *_ddi)
> > +static int local_pci_probe(struct drv_dev_and_id *ddi)
> >   {
> > -	struct drv_dev_and_id *ddi = _ddi;
> >   	struct pci_dev *pci_dev = ddi->dev;
> >   	struct pci_driver *pci_drv = ddi->drv;
> >   	struct device *dev = &pci_dev->dev;
> > @@ -338,6 +337,19 @@ static long local_pci_probe(void *_ddi)
> >   	return 0;
> >   }
> > +struct pci_probe_arg {
> > +	struct drv_dev_and_id *ddi;
> > +	struct work_struct work;
> > +	int ret;
> > +};
> > +
> > +static void local_pci_probe_callback(struct work_struct *work)
> > +{
> > +	struct pci_probe_arg *arg = container_of(work, struct pci_probe_arg, work);
> > +
> > +	arg->ret = local_pci_probe(arg->ddi);
> > +}
> > +
> >   static bool pci_physfn_is_probed(struct pci_dev *dev)
> >   {
> >   #ifdef CONFIG_PCI_IOV
> > @@ -362,34 +374,44 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> >   	dev->is_probed = 1;
> >   	cpu_hotplug_disable();
> > -
> >   	/*
> >   	 * Prevent nesting work_on_cpu() for the case where a Virtual Function
> >   	 * device is probed from work_on_cpu() of the Physical device.
> >   	 */
> >   	if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
> >   	    pci_physfn_is_probed(dev)) {
> > -		cpu = nr_cpu_ids;
> > +		error = local_pci_probe(&ddi);
> >   	} else {
> >   		cpumask_var_t wq_domain_mask;
> > +		struct pci_probe_arg arg = { .ddi = &ddi };
> > +
> > +		INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback);
> >   		if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
> >   			error = -ENOMEM;
> >   			goto out;
> >   		}
> > +
> > +		rcu_read_lock();
> >   		cpumask_and(wq_domain_mask,
> >   			    housekeeping_cpumask(HK_TYPE_WQ),
> >   			    housekeeping_cpumask(HK_TYPE_DOMAIN));
> >   		cpu = cpumask_any_and(cpumask_of_node(node),
> >   				      wq_domain_mask);
> > +		if (cpu < nr_cpu_ids) {
> > +			schedule_work_on(cpu, &arg.work);
> > +			rcu_read_unlock();
> > +			flush_work(&arg.work);
> > +			error = arg.ret;
> > +		} else {
> > +			rcu_read_unlock();
> > +			error = local_pci_probe(&ddi);
> > +		}
> > +
> >   		free_cpumask_var(wq_domain_mask);
> > +		destroy_work_on_stack(&arg.work);
> >   	}
> > -
> > -	if (cpu < nr_cpu_ids)
> > -		error = work_on_cpu(cpu, local_pci_probe, &ddi);
> > -	else
> > -		error = local_pci_probe(&ddi);
> >   out:
> >   	dev->is_probed = 0;
> >   	cpu_hotplug_enable();
> 
> A question. Is the purpose of open-coding work_on_cpu() to avoid calling
> INIT_WORK_ONSTACK() and destroy_work_on_stack() in RCU read-side critical
> section? These two macro/function may call debugobjects code which I don't
> know if they are allowed inside rcu_read_lock() critical section.
> 
> Cheers, Longman

No the point is that I need to keep the target selection
(housekeeping_cpumask() read) and the work queue within the same
RCU critical section so that things are synchronized that way:

    CPU 0                                          CPU 1
    -----                                          -----
    rcu_read_lock()                                housekeeping_update()
    cpu = cpumask_any(housekeeping_cpumask(...))       housekeeping_cpumask &= ~val
    queue_work_on(cpu, pci_probe_wq, work)             synchronize_rcu()
    rcu_read_unlock()                                  flush_workqueue(pci_probe_wq)
    flush_work(work)
        
And I can't include the whole work_on_cpu() within rcu_read_lock() because
flush_work() may sleep.

Also now that you mention it, I need to create that pci_probe_wq and flush it :-)

-- 
Frederic Weisbecker
SUSE Labs