lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1545ac2-9a4e-49e9-b918-205f617ec900@redhat.com>
Date: Mon, 22 Sep 2025 17:51:39 -0400
From: Waiman Long <llong@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>, Waiman Long <llong@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
 Marco Crivellari <marco.crivellari@...e.com>, Michal Hocko
 <mhocko@...e.com>, Peter Zijlstra <peterz@...radead.org>,
 Tejun Heo <tj@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Vlastimil Babka <vbabka@...e.cz>, linux-pci@...r.kernel.org
Subject: Re: [PATCH 02/33] PCI: Protect against concurrent change of
 housekeeping cpumask

On 9/18/25 10:00 AM, Frederic Weisbecker wrote:
> Le Fri, Aug 29, 2025 at 06:01:17PM -0400, Waiman Long a écrit :
>> On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
>>> HK_TYPE_DOMAIN will soon integrate cpuset isolated partitions and
>>> therefore be made modifyable at runtime. Synchronize against the cpumask
>>> update using RCU.
>>>
>>> The RCU locked section includes both the housekeeping CPU target
>>> election for the PCI probe work and the work enqueue.
>>>
>>> This way the housekeeping update side will simply need to flush the
>>> pending related works after updating the housekeeping mask in order to
>>> make sure that no PCI work ever executes on an isolated CPU.
>>>
>>> Signed-off-by: Frederic Weisbecker<frederic@...nel.org>
>>> ---
>>>    drivers/pci/pci-driver.c | 40 +++++++++++++++++++++++++++++++---------
>>>    1 file changed, 31 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>> index 63665240ae87..cf2b83004886 100644
>>> --- a/drivers/pci/pci-driver.c
>>> +++ b/drivers/pci/pci-driver.c
>>> @@ -302,9 +302,8 @@ struct drv_dev_and_id {
>>>    	const struct pci_device_id *id;
>>>    };
>>> -static long local_pci_probe(void *_ddi)
>>> +static int local_pci_probe(struct drv_dev_and_id *ddi)
>>>    {
>>> -	struct drv_dev_and_id *ddi = _ddi;
>>>    	struct pci_dev *pci_dev = ddi->dev;
>>>    	struct pci_driver *pci_drv = ddi->drv;
>>>    	struct device *dev = &pci_dev->dev;
>>> @@ -338,6 +337,19 @@ static long local_pci_probe(void *_ddi)
>>>    	return 0;
>>>    }
>>> +struct pci_probe_arg {
>>> +	struct drv_dev_and_id *ddi;
>>> +	struct work_struct work;
>>> +	int ret;
>>> +};
>>> +
>>> +static void local_pci_probe_callback(struct work_struct *work)
>>> +{
>>> +	struct pci_probe_arg *arg = container_of(work, struct pci_probe_arg, work);
>>> +
>>> +	arg->ret = local_pci_probe(arg->ddi);
>>> +}
>>> +
>>>    static bool pci_physfn_is_probed(struct pci_dev *dev)
>>>    {
>>>    #ifdef CONFIG_PCI_IOV
>>> @@ -362,34 +374,44 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>>>    	dev->is_probed = 1;
>>>    	cpu_hotplug_disable();
>>> -
>>>    	/*
>>>    	 * Prevent nesting work_on_cpu() for the case where a Virtual Function
>>>    	 * device is probed from work_on_cpu() of the Physical device.
>>>    	 */
>>>    	if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>>>    	    pci_physfn_is_probed(dev)) {
>>> -		cpu = nr_cpu_ids;
>>> +		error = local_pci_probe(&ddi);
>>>    	} else {
>>>    		cpumask_var_t wq_domain_mask;
>>> +		struct pci_probe_arg arg = { .ddi = &ddi };
>>> +
>>> +		INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback);
>>>    		if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
>>>    			error = -ENOMEM;
>>>    			goto out;
>>>    		}
>>> +
>>> +		rcu_read_lock();
>>>    		cpumask_and(wq_domain_mask,
>>>    			    housekeeping_cpumask(HK_TYPE_WQ),
>>>    			    housekeeping_cpumask(HK_TYPE_DOMAIN));
>>>    		cpu = cpumask_any_and(cpumask_of_node(node),
>>>    				      wq_domain_mask);
>>> +		if (cpu < nr_cpu_ids) {
>>> +			schedule_work_on(cpu, &arg.work);
>>> +			rcu_read_unlock();
>>> +			flush_work(&arg.work);
>>> +			error = arg.ret;
>>> +		} else {
>>> +			rcu_read_unlock();
>>> +			error = local_pci_probe(&ddi);
>>> +		}
>>> +
>>>    		free_cpumask_var(wq_domain_mask);
>>> +		destroy_work_on_stack(&arg.work);
>>>    	}
>>> -
>>> -	if (cpu < nr_cpu_ids)
>>> -		error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>> -	else
>>> -		error = local_pci_probe(&ddi);
>>>    out:
>>>    	dev->is_probed = 0;
>>>    	cpu_hotplug_enable();
>> A question. Is the purpose of open-coding work_on_cpu() to avoid calling
>> INIT_WORK_ONSTACK() and destroy_work_on_stack() in RCU read-side critical
>> section? These two macro/function may call debugobjects code which I don't
>> know if they are allowed inside rcu_read_lock() critical section.
>>
>> Cheers, Longman
> No the point is that I need to keep the target selection
> (housekeeping_cpumask() read) and the work queue within the same
> RCU critical section so that things are synchronized that way:
>
>      CPU 0                                          CPU 1
>      -----                                          -----
>      rcu_read_lock()                                housekeeping_update()
>      cpu = cpumask_any(housekeeping_cpumask(...))       housekeeping_cpumask &= ~val
>      queue_work_on(cpu, pci_probe_wq, work)             synchronize_rcu()
>      rcu_read_unlock()                                  flush_workqueue(pci_probe_wq)
>      flush_work(work)
>          
> And I can't include the whole work_on_cpu() within rcu_read_lock() because
> flush_work() may sleep.

Right, you are trying to avoid flush_work() within rcu_read_lock() 
critical section. It makes it easier to review if you mention that in 
the commit log.

>
> Also now that you mention it, I need to create that pci_probe_wq and flush it :-)

OK, another wq :-)

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ