[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250922-50372a07397db3155fec49c9@orel>
Date: Mon, 22 Sep 2025 16:20:43 -0500
From: Andrew Jones <ajones@...tanamicro.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: iommu@...ts.linux.dev, kvm-riscv@...ts.infradead.org,
kvm@...r.kernel.org, linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
zong.li@...ive.com, tjeznach@...osinc.com, joro@...tes.org, will@...nel.org,
robin.murphy@....com, anup@...infault.org, atish.patra@...ux.dev, tglx@...utronix.de,
alex.williamson@...hat.com, paul.walmsley@...ive.com, palmer@...belt.com, alex@...ti.fr
Subject: Re: [RFC PATCH v2 08/18] iommu/riscv: Use MSI table to enable IMSIC
access
On Mon, Sep 22, 2025 at 03:43:36PM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 20, 2025 at 03:38:58PM -0500, Andrew Jones wrote:
> > When setting irq affinity extract the IMSIC address the device
> > needs to access and add it to the MSI table. If the device no
> > longer needs access to an IMSIC then remove it from the table
> > to prohibit access. This allows isolating device MSIs to a set
> > of harts so we can now add the IRQ_DOMAIN_FLAG_ISOLATED_MSI IRQ
> > domain flag.
>
> IRQ_DOMAIN_FLAG_ISOLATED_MSI has nothing to do with HARTs.
>
> * Isolated MSI means that HW modeled by an irq_domain on the path from the
> * initiating device to the CPU will validate that the MSI message specifies an
> * interrupt number that the device is authorized to trigger. This must block
> * devices from triggering interrupts they are not authorized to trigger.
> * Currently authorization means the MSI vector is one assigned to the device.
Unfortunately the RISC-V IOMMU doesn't have support for this. I've raised
the lack of MSI data validation to the spec writers and I'll try to raise
it again, but I was hoping we could still get IRQ_DOMAIN_FLAG_ISOLATED_MSI
by simply ensuring the MSI addresses only include the affined harts (and
also with the NOTE comment I've put in this patch to point out the
deficiency).
>
> It has to do with each PCI BDF having a unique set of
> validation/mapping tables for MSIs that are granular to the interrupt
> number.
Interrupt numbers (MSI data) aren't used by the RISC-V IOMMU in any way.
>
> As I understand the spec this is is only possible with msiptp? As
> discussed previously this has to be a static property and the SW stack
> doesn't expect it to change. So if the IR driver sets
> IRQ_DOMAIN_FLAG_ISOLATED_MSI it has to always use misptp?
Yes, the patch only sets IRQ_DOMAIN_FLAG_ISOLATED_MSI when the IOMMU
has RISCV_IOMMU_CAPABILITIES_MSI_FLAT and it will remain set for the
lifetime of the irqdomain, no matter how the IOMMU is being applied.
>
> Further, since the interrupt tables have to be per BDF they cannot be
> linked to an iommu_domain! Storing the msiptp in an iommu_domain is
> totally wrong?? It needs to somehow be stored in the interrupt layer
> per-struct device, check how AMD and Intel have stored their IR tables
> programmed into their versions of DC.
The RISC-V IOMMU MSI table is simply a flat address remapping table,
which also has support for MRIFs. The table indices come from an
address matching mechanism used to filter out invalid addresses and
to convert valid addresses into MSI table indices. IOW, the RISC-V
MSI table is a simple translation table, and even needs to be tied to
a particular DMA table in order to work. Here's some examples
1. stage1 not BARE
------------------
stage1 MSI table
IOVA ------> A ---------> host-MSI-address
2. stage1 is BARE, for example if only stage2 is in use
-------------------------------------------------------
MSI table
IOVA == A ---------> host-MSI-address
When used by the host A == host-MSI-address, but at least we can block
the write when an IRQ has been affined to a set of harts that doesn't
include what it's targeting. When used for irqbypass A == guest-MSI-
address and the host-MSI-address will be that of a guest interrupt file.
This ensures a device assigned to a guest can only reach its own vcpus
when sending MSIs.
In the first example, where stage1 is not BARE, the stage1 page tables
must have some IOVA->A mapping, otherwise the MSI table will not get
a chance to do a translation, as the stage1 DMA will fault. This
series ensures stage1 gets an identity mapping for all possible MSI
targets and then leaves it be, using the MSI tables instead for the
isolation.
I don't think we can apply a lot of AMD's and Intel's model to RISC-V.
>
> It looks like there is something in here to support HW that doesn't
> have msiptp? That's different, and also looks very confused.
The only support is to ensure all the host IMSICs are mapped, otherwise
we can't turn on IOMMU_DMA since all MSI writes will cause faults. We
don't set IRQ_DOMAIN_FLAG_ISOLATED_MSI in this case, though, since we
don't bother unmapping MSI addresses of harts that IRQs have be un-
affined from.
> The IR
> driver should never be touching the iommu domain or calling iommu_map!
As pointed out above, the RISC-V IR is quite a different beast than AMD
and Intel. Whether or not the IOMMU has MSI table support, the IMSICs
must be mapped in stage1, when stage1 is not BARE. So, in both cases we
roll that mapping into the IR code since there isn't really any better
place for it for the host case and it's necessary for the IR code to
manage it for the virt case. Since IR (or MSI delivery in general) is
dependent upon the stage1 page tables, then it's necessary to be tied to
the same IOMMU domain that those page tables are tied to. Patch4's changes
to riscv_iommu_attach_paging_domain() and riscv_iommu_iodir_update() show
how they're tied together.
> Instead it probably has to use the SW_MSI mechanism to request mapping
> the interrupt controller aperture. You don't get
> IRQ_DOMAIN_FLAG_ISOLATED_MSI with something like this though. Look at
> how ARM GIC works for this mechanism.
I'm not seeing how SW_MSI will help here, but so far I've just done some
quick grepping and code skimming.
>
> Finally, please split this series up, if ther are two different ways
> to manage the MSI aperture then please split it into two series with a
> clear description how the HW actually works.
>
> Maybe start with the simpler case of no msiptp??
The first five patches plus the "enable IOMMU_DMA" will allow paging
domains to be used by default, while paving the way for patches 6-8 to
allow host IRQs to be isolated to the best of our ability (only able to
access IMSICs to which they are affined). So we could have
series1: irqdomain + map all imsics + enable IOMMU_DMA
series2: actually apply irqdomain in order to implement map/unmap of MSI
ptes based on IRQ affinity - set IRQ_DOMAIN_FLAG_ISOLATED_MSI,
because that's the best we've got...
series3: the rest of the patches of this series which introduce irqbypass
support for the virt use case
Would that be better? Or do you see some need for some patch splits as
well?
Thanks,
drew
Powered by blists - more mailing lists