lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2621385c-6fcf-4035-a5a0-5427a08045c8@arm.com>
Date: Thu, 14 Nov 2024 15:35:15 +0000
From: Robin Murphy <robin.murphy@....com>
To: Alex Williamson <alex.williamson@...hat.com>,
 Jason Gunthorpe <jgg@...dia.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, tglx@...utronix.de, maz@...nel.org,
 bhelgaas@...gle.com, leonro@...dia.com,
 shameerali.kolothum.thodi@...wei.com, dlemoal@...nel.org,
 kevin.tian@...el.com, smostafa@...gle.com,
 andriy.shevchenko@...ux.intel.com, reinette.chatre@...el.com,
 eric.auger@...hat.com, ddutile@...hat.com, yebin10@...wei.com,
 brauner@...nel.org, apatel@...tanamicro.com,
 shivamurthy.shastri@...utronix.de, anna-maria@...utronix.de,
 nipun.gupta@....com, marek.vasut+renesas@...lbox.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address
 for each MSI vector

On 13/11/2024 9:11 pm, Alex Williamson wrote:
> On Tue, 12 Nov 2024 21:34:30 -0400
> Jason Gunthorpe <jgg@...dia.com> wrote:
> 
>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>> To solve this problem the VMM should capture the MSI IOVA allocated by the
>>>>> guest kernel and relay it to the GIC driver in the host kernel, to program
>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>
>>>> Once VFIO has that information from userspace, though, do we really need
>>>> the whole complicated dance to push it right down into the irqchip layer
>>>> just so it can be passed back up again? AFAICS
>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>> straightforward to override the message address in general at that
>>>> level, without the lower layers having to be aware at all, no?
>>>
>>> Didn't see that clearly!! It works with a simple following override:
>>> --------------------------------------------------------------------
>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>                  struct msi_msg msg;
>>>
>>>                  get_cached_msi_msg(irq, &msg);
>>> +               if (vdev->msi_iovas) {
>>> +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
>>> +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
>>> +               }
>>>                  pci_write_msi_msg(irq, &msg);
>>>          }
>>>   
>>> --------------------------------------------------------------------
>>>
>>> With that, I think we only need one VFIO change for this part :)
>>
>> Wow, is that really OK from a layering perspective? The comment is
>> pretty clear on the intention that this is to resync the irq layer
>> view of the device with the physical HW.
>>
>> Editing the msi_msg while doing that resync smells bad.
>>
>> Also, this is only doing MSI-X, we should include normal MSI as
>> well. (it probably should have a resync too?)
> 
> This was added for a specific IBM HBA that clears the vector table
> during a built-in self test, so it's possible the MSI table being in
> config space never had the same issue, or we just haven't encountered
> it.  I don't expect anything else actually requires this.

Yeah, I wasn't really suggesting to literally hook into this exact case; 
it was more just a general observation that if VFIO already has one 
justification for tinkering with pci_write_msi_msg() directly without 
going through the msi_domain layer, then adding another (wherever it 
fits best) can't be *entirely* unreasonable.

At the end of the day, the semantic here is that VFIO does know more 
than the IRQ layer, and does need to program the endpoint differently 
from what the irqchip assumes, so I don't see much benefit in dressing 
that up more than functionally necessary.

>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>> context)
> 
> It seems suspect to me too.  In a sense it is still just synchronizing
> the MSI address, but to a different address space.
> 
> Is it possible to do this with the existing write_msi_msg callback on
> the msi descriptor?  For instance we could simply translate the msg
> address and call pci_write_msi_msg() (while avoiding an infinite
> recursion).  Or maybe there should be an xlate_msi_msg callback we can
> register.  Or I suppose there might be a way to insert an irqchip that
> does the translation on write.  Thanks,

I'm far from keen on the idea, but if there really is an appetite for 
more indirection, then I guess the least-worst option would be yet 
another type of iommu_dma_cookie to work via the existing 
iommu_dma_compose_msi_msg() flow, with some interface for VFIO to update 
per-device addresses directly. But then it's still going to need some 
kind of "layering violation" for VFIO to poke the IRQ layer into 
re-composing and re-writing a message whenever userspace feels like 
changing an address, because we're fundamentally stepping outside the 
established lifecycle of a kernel-managed IRQ around which said layering 
was designed...

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ