[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2621385c-6fcf-4035-a5a0-5427a08045c8@arm.com>
Date: Thu, 14 Nov 2024 15:35:15 +0000
From: Robin Murphy <robin.murphy@....com>
To: Alex Williamson <alex.williamson@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, tglx@...utronix.de, maz@...nel.org,
bhelgaas@...gle.com, leonro@...dia.com,
shameerali.kolothum.thodi@...wei.com, dlemoal@...nel.org,
kevin.tian@...el.com, smostafa@...gle.com,
andriy.shevchenko@...ux.intel.com, reinette.chatre@...el.com,
eric.auger@...hat.com, ddutile@...hat.com, yebin10@...wei.com,
brauner@...nel.org, apatel@...tanamicro.com,
shivamurthy.shastri@...utronix.de, anna-maria@...utronix.de,
nipun.gupta@....com, marek.vasut+renesas@...lbox.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address
for each MSI vector
On 13/11/2024 9:11 pm, Alex Williamson wrote:
> On Tue, 12 Nov 2024 21:34:30 -0400
> Jason Gunthorpe <jgg@...dia.com> wrote:
>
>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>> To solve this problem the VMM should capture the MSI IOVA allocated by the
>>>>> guest kernel and relay it to the GIC driver in the host kernel, to program
>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>
>>>> Once VFIO has that information from userspace, though, do we really need
>>>> the whole complicated dance to push it right down into the irqchip layer
>>>> just so it can be passed back up again? AFAICS
>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>> straightforward to override the message address in general at that
>>>> level, without the lower layers having to be aware at all, no?
>>>
>>> Didn't see that clearly!! It works with a simple following override:
>>> --------------------------------------------------------------------
>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>> struct msi_msg msg;
>>>
>>> get_cached_msi_msg(irq, &msg);
>>> + if (vdev->msi_iovas) {
>>> + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
>>> + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
>>> + }
>>> pci_write_msi_msg(irq, &msg);
>>> }
>>>
>>> --------------------------------------------------------------------
>>>
>>> With that, I think we only need one VFIO change for this part :)
>>
>> Wow, is that really OK from a layering perspective? The comment is
>> pretty clear on the intention that this is to resync the irq layer
>> view of the device with the physical HW.
>>
>> Editing the msi_msg while doing that resync smells bad.
>>
>> Also, this is only doing MSI-X, we should include normal MSI as
>> well. (it probably should have a resync too?)
>
> This was added for a specific IBM HBA that clears the vector table
> during a built-in self test, so it's possible the MSI table being in
> config space never had the same issue, or we just haven't encountered
> it. I don't expect anything else actually requires this.
Yeah, I wasn't really suggesting to literally hook into this exact case;
it was more just a general observation that if VFIO already has one
justification for tinkering with pci_write_msi_msg() directly without
going through the msi_domain layer, then adding another (wherever it
fits best) can't be *entirely* unreasonable.
At the end of the day, the semantic here is that VFIO does know more
than the IRQ layer, and does need to program the endpoint differently
from what the irqchip assumes, so I don't see much benefit in dressing
that up more than functionally necessary.
>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>> context)
>
> It seems suspect to me too. In a sense it is still just synchronizing
> the MSI address, but to a different address space.
>
> Is it possible to do this with the existing write_msi_msg callback on
> the msi descriptor? For instance we could simply translate the msg
> address and call pci_write_msi_msg() (while avoiding an infinite
> recursion). Or maybe there should be an xlate_msi_msg callback we can
> register. Or I suppose there might be a way to insert an irqchip that
> does the translation on write. Thanks,
I'm far from keen on the idea, but if there really is an appetite for
more indirection, then I guess the least-worst option would be yet
another type of iommu_dma_cookie to work via the existing
iommu_dma_compose_msi_msg() flow, with some interface for VFIO to update
per-device addresses directly. But then it's still going to need some
kind of "layering violation" for VFIO to poke the IRQ layer into
re-composing and re-writing a message whenever userspace feels like
changing an address, because we're fundamentally stepping outside the
established lifecycle of a kernel-managed IRQ around which said layering
was designed...
Thanks,
Robin.
Powered by blists - more mailing lists