[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <878qlmqtbn.ffs@tglx>
Date: Fri, 20 Jun 2025 18:19:24 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>, Michael Kelley
<mhklinux@...look.com>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-pci@...r.kernel.org"
<linux-pci@...r.kernel.org>
Cc: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>, "catalin.marinas@....com"
<catalin.marinas@....com>, "will@...nel.org" <will@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"hpa@...or.com" <hpa@...or.com>, "lpieralisi@...nel.org"
<lpieralisi@...nel.org>, "kw@...ux.com" <kw@...ux.com>, "robh@...nel.org"
<robh@...nel.org>, "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"jinankjain@...ux.microsoft.com" <jinankjain@...ux.microsoft.com>,
"skinsburskii@...ux.microsoft.com" <skinsburskii@...ux.microsoft.com>,
"mrathor@...ux.microsoft.com" <mrathor@...ux.microsoft.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function
On Wed, Jun 18 2025 at 14:08, Nuno Das Neves wrote:
> On 6/11/2025 4:07 PM, Michael Kelley wrote:
>> From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Tuesday, June 10, 2025 4:52 PM
>>> +/**
>>> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
>>> + * @data: Describes the IRQ
>>> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
>>> + *
>>> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
>>> + */
>>> +int hv_map_msi_interrupt(struct irq_data *data,
>>> + struct hv_interrupt_entry *out_entry)
>>> {
>>> - union hv_device_id device_id = hv_build_pci_dev_id(dev);
>>> + struct msi_desc *msidesc;
>>> + struct pci_dev *dev;
>>> + union hv_device_id device_id;
>>> + struct hv_interrupt_entry dummy;
>>> + struct irq_cfg *cfg = irqd_cfg(data);
>>> + const cpumask_t *affinity;
>>> + int cpu;
>>> + u64 res;
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
>>>
>>> - return hv_map_interrupt(device_id, false, cpu, vector, entry);
>>> + msidesc = irq_data_get_msi_desc(data);
>>> + dev = msi_desc_to_pci_dev(msidesc);
>>> + device_id = hv_build_pci_dev_id(dev);
>>> + affinity = irq_data_get_effective_affinity_mask(data);
>>> + cpu = cpumask_first_and(affinity, cpu_online_mask);
>>
>> Is the cpus_read_lock held at this point? I'm not sure what the
>> overall calling sequence looks like. If it is not held, the CPU that
>> is selected could go offline before hv_map_interrupt() is called.
>> This computation of the target CPU is the same as in the code
>> before this patch, but that existing code looks like it has the
>> same problem.
>>
>
> Thanks for pointing it out - It *looks* like the read lock is not held
> everywhere this could be called, so it could indeed be a problem.
>
> I've been thinking about different ways around this but I lack the
> knowledge to have an informed opinion about it:
>
> - We could take the cpu read lock in this function, would that work?
Obviously not.
> - I'm not actually sure why the code is getting the first cpu off the effective
> affinity mask in the first place. It is possible to get the apic id (and hence
> the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
> Maybe we could get the cpu that way, assuming that doesn't have a
> similar issue.
There is no reason to fiddle in the underlying low level data. The
effective affinity mask is there for a reason.
> - We could just let this race happen, maybe the outcome isn't too catastrophic?
Let's terminate guesswork mode and look at the facts.
The point is that hv_map_msi_interrupt() is called from:
1) hv_irq_compose_msi_msg()
2) hv_arch_irq_unmask() (in patch 4/4)
Both functions are interrupt chip callbacks and invoked with the
interrupt descriptor lock held.
At the point where they are called, the effective affinity mask is valid
and immutable. Nothing can modify it as any modification requires the
interrupt descriptor lock to be held. This applies to the CPU hotplug
machinery too. So this AND cpu_online_mask is a complete pointless
voodoo exercise.
Just use:
cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
and be done with it.
Please fix that first with a seperate patch before moving this code
around.
Thanks,
tglx
Powered by blists - more mailing lists