[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250205144904.39285de4@DESKTOP-0403QTC.>
Date: Wed, 5 Feb 2025 14:49:04 -0800
From: Jacob Pan <jacob.pan@...ux.microsoft.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: <will@...nel.org>, <robin.murphy@....com>, <jgg@...dia.com>,
<kevin.tian@...el.com>, <tglx@...utronix.de>, <maz@...nel.org>,
<alex.williamson@...hat.com>, <joro@...tes.org>, <shuah@...nel.org>,
<reinette.chatre@...el.com>, <eric.auger@...hat.com>, <yebin10@...wei.com>,
<apatel@...tanamicro.com>, <shivamurthy.shastri@...utronix.de>,
<bhelgaas@...gle.com>, <anna-maria@...utronix.de>, <yury.norov@...il.com>,
<nipun.gupta@....com>, <iommu@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<kvm@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
<patches@...ts.linux.dev>, <jean-philippe@...aro.org>, <mdf@...nel.org>,
<mshavit@...gle.com>, <shameerali.kolothum.thodi@...wei.com>,
<smostafa@...gle.com>, <ddutile@...hat.com>, jacob.pan@...ux.microsoft.com
Subject: Re: [PATCH RFCv2 00/13] iommu: Add MSI mapping support with nested
SMMU
Hi Nicolin,
On Fri, 10 Jan 2025 19:32:16 -0800
Nicolin Chen <nicolinc@...dia.com> wrote:
> [ Background ]
> On ARM GIC systems and others, the target address of the MSI is
> translated by the IOMMU. For GIC, the MSI address page is called
> "ITS" page. When the IOMMU is disabled, the MSI address is programmed
> to the physical location of the GIC ITS page (e.g. 0x20200000). When
> the IOMMU is enabled, the ITS page is behind the IOMMU, so the MSI
> address is programmed to an allocated IO virtual address (a.k.a
> IOVA), e.g. 0xFFFF0000, which must be mapped to the physical ITS
> page: IOVA (0xFFFF0000) ===> PA (0x20200000). When a 2-stage
> translation is enabled, IOVA will be still used to program the MSI
> address, though the mappings will be in two stages: IOVA (0xFFFF0000)
> ===> IPA (e.g. 0x80900000) ===> PA (0x20200000) (IPA stands for
> Intermediate Physical Address).
>
> If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA,
> the IOVA is dynamically allocated from the top of the IOVA space. If
> attached to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough
> device), the IOVA is fixed to an MSI window reported by the IOMMU
> driver via IOMMU_RESV_SW_MSI, which is hardwired to MSI_IOVA_BASE
> (IOVA==0x8000000) for ARM IOMMUs.
>
> So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in
> charge of the IOMMU translation (1-stage translation), since the IOVA
> for the ITS page is fixed and known by kernel. However, with virtual
> machine enabling a nested IOMMU translation (2-stage), a guest kernel
> directly controls the stage-1 translation with an IOMMU_DOMAIN_DMA,
> mapping a vITS page (at an IPA 0x80900000) onto its own IOVA space
> (e.g. 0xEEEE0000). Then, the host kernel can't know that guest-level
> IOVA to program the MSI address.
>
> There have been two approaches to solve this problem:
> 1. Create an identity mapping in the stage-1. VMM could insert a few
> RMRs (Reserved Memory Regions) in guest's IORT. Then the guest kernel
> would fetch these RMR entries from the IORT and create an
> IOMMU_RESV_DIRECT region per iommu group for a direct mapping.
> Eventually, the mappings would look like: IOVA (0x8000000) === IPA
> (0x8000000) ===> 0x20200000 This requires an IOMMUFD ioctl for kernel
> and VMM to agree on the IPA.
Should this RMR be in a separate range than MSI_IOVA_BASE? The guest
will have MSI_IOVA_BASE in a reserved region already, no?
e.g. # cat
/sys/bus/pci/devices/0015\:01\:00.0/iommu_group/reserved_regions
0x0000000008000000 0x00000000080fffff msi
Powered by blists - more mailing lists