[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1545ac2-9a4e-49e9-b918-205f617ec900@redhat.com>
Date: Mon, 22 Sep 2025 17:51:39 -0400
From: Waiman Long <llong@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>, Waiman Long <llong@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Marco Crivellari <marco.crivellari@...e.com>, Michal Hocko
<mhocko@...e.com>, Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>, linux-pci@...r.kernel.org
Subject: Re: [PATCH 02/33] PCI: Protect against concurrent change of
housekeeping cpumask
On 9/18/25 10:00 AM, Frederic Weisbecker wrote:
> Le Fri, Aug 29, 2025 at 06:01:17PM -0400, Waiman Long a écrit :
>> On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
>>> HK_TYPE_DOMAIN will soon integrate cpuset isolated partitions and
>>> therefore be made modifyable at runtime. Synchronize against the cpumask
>>> update using RCU.
>>>
>>> The RCU locked section includes both the housekeeping CPU target
>>> election for the PCI probe work and the work enqueue.
>>>
>>> This way the housekeeping update side will simply need to flush the
>>> pending related works after updating the housekeeping mask in order to
>>> make sure that no PCI work ever executes on an isolated CPU.
>>>
>>> Signed-off-by: Frederic Weisbecker<frederic@...nel.org>
>>> ---
>>> drivers/pci/pci-driver.c | 40 +++++++++++++++++++++++++++++++---------
>>> 1 file changed, 31 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>> index 63665240ae87..cf2b83004886 100644
>>> --- a/drivers/pci/pci-driver.c
>>> +++ b/drivers/pci/pci-driver.c
>>> @@ -302,9 +302,8 @@ struct drv_dev_and_id {
>>> const struct pci_device_id *id;
>>> };
>>> -static long local_pci_probe(void *_ddi)
>>> +static int local_pci_probe(struct drv_dev_and_id *ddi)
>>> {
>>> - struct drv_dev_and_id *ddi = _ddi;
>>> struct pci_dev *pci_dev = ddi->dev;
>>> struct pci_driver *pci_drv = ddi->drv;
>>> struct device *dev = &pci_dev->dev;
>>> @@ -338,6 +337,19 @@ static long local_pci_probe(void *_ddi)
>>> return 0;
>>> }
>>> +struct pci_probe_arg {
>>> + struct drv_dev_and_id *ddi;
>>> + struct work_struct work;
>>> + int ret;
>>> +};
>>> +
>>> +static void local_pci_probe_callback(struct work_struct *work)
>>> +{
>>> + struct pci_probe_arg *arg = container_of(work, struct pci_probe_arg, work);
>>> +
>>> + arg->ret = local_pci_probe(arg->ddi);
>>> +}
>>> +
>>> static bool pci_physfn_is_probed(struct pci_dev *dev)
>>> {
>>> #ifdef CONFIG_PCI_IOV
>>> @@ -362,34 +374,44 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>>> dev->is_probed = 1;
>>> cpu_hotplug_disable();
>>> -
>>> /*
>>> * Prevent nesting work_on_cpu() for the case where a Virtual Function
>>> * device is probed from work_on_cpu() of the Physical device.
>>> */
>>> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>>> pci_physfn_is_probed(dev)) {
>>> - cpu = nr_cpu_ids;
>>> + error = local_pci_probe(&ddi);
>>> } else {
>>> cpumask_var_t wq_domain_mask;
>>> + struct pci_probe_arg arg = { .ddi = &ddi };
>>> +
>>> + INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback);
>>> if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
>>> error = -ENOMEM;
>>> goto out;
>>> }
>>> +
>>> + rcu_read_lock();
>>> cpumask_and(wq_domain_mask,
>>> housekeeping_cpumask(HK_TYPE_WQ),
>>> housekeeping_cpumask(HK_TYPE_DOMAIN));
>>> cpu = cpumask_any_and(cpumask_of_node(node),
>>> wq_domain_mask);
>>> + if (cpu < nr_cpu_ids) {
>>> + schedule_work_on(cpu, &arg.work);
>>> + rcu_read_unlock();
>>> + flush_work(&arg.work);
>>> + error = arg.ret;
>>> + } else {
>>> + rcu_read_unlock();
>>> + error = local_pci_probe(&ddi);
>>> + }
>>> +
>>> free_cpumask_var(wq_domain_mask);
>>> + destroy_work_on_stack(&arg.work);
>>> }
>>> -
>>> - if (cpu < nr_cpu_ids)
>>> - error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>> - else
>>> - error = local_pci_probe(&ddi);
>>> out:
>>> dev->is_probed = 0;
>>> cpu_hotplug_enable();
>> A question. Is the purpose of open-coding work_on_cpu() to avoid calling
>> INIT_WORK_ONSTACK() and destroy_work_on_stack() in RCU read-side critical
>> section? These two macro/function may call debugobjects code which I don't
>> know if they are allowed inside rcu_read_lock() critical section.
>>
>> Cheers, Longman
> No the point is that I need to keep the target selection
> (housekeeping_cpumask() read) and the work queue within the same
> RCU critical section so that things are synchronized that way:
>
> CPU 0 CPU 1
> ----- -----
> rcu_read_lock() housekeeping_update()
> cpu = cpumask_any(housekeeping_cpumask(...)) housekeeping_cpumask &= ~val
> queue_work_on(cpu, pci_probe_wq, work) synchronize_rcu()
> rcu_read_unlock() flush_workqueue(pci_probe_wq)
> flush_work(work)
>
> And I can't include the whole work_on_cpu() within rcu_read_lock() because
> flush_work() may sleep.
Right, you are trying to avoid flush_work() within rcu_read_lock()
critical section. It makes it easier to review if you mention that in
the commit log.
>
> Also now that you mention it, I need to create that pci_probe_wq and flush it :-)
OK, another wq :-)
Cheers,
Longman
Powered by blists - more mailing lists