lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4946ea266bdc4b1e8796dee1b228bd8f@huawei.com>
Date: Thu, 23 Jan 2025 09:06:49 +0000
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To: Nicolin Chen <nicolinc@...dia.com>, "will@...nel.org" <will@...nel.org>,
	"robin.murphy@....com" <robin.murphy@....com>, "jgg@...dia.com"
	<jgg@...dia.com>, "kevin.tian@...el.com" <kevin.tian@...el.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "maz@...nel.org" <maz@...nel.org>,
	"alex.williamson@...hat.com" <alex.williamson@...hat.com>
CC: "joro@...tes.org" <joro@...tes.org>, "shuah@...nel.org"
	<shuah@...nel.org>, "reinette.chatre@...el.com" <reinette.chatre@...el.com>,
	"eric.auger@...hat.com" <eric.auger@...hat.com>, "yebin (H)"
	<yebin10@...wei.com>, "apatel@...tanamicro.com" <apatel@...tanamicro.com>,
	"shivamurthy.shastri@...utronix.de" <shivamurthy.shastri@...utronix.de>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>, "anna-maria@...utronix.de"
	<anna-maria@...utronix.de>, "yury.norov@...il.com" <yury.norov@...il.com>,
	"nipun.gupta@....com" <nipun.gupta@....com>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>, "patches@...ts.linux.dev"
	<patches@...ts.linux.dev>, "jean-philippe@...aro.org"
	<jean-philippe@...aro.org>, "mdf@...nel.org" <mdf@...nel.org>,
	"mshavit@...gle.com" <mshavit@...gle.com>, "smostafa@...gle.com"
	<smostafa@...gle.com>, "ddutile@...hat.com" <ddutile@...hat.com>
Subject: RE: [PATCH RFCv2 00/13] iommu: Add MSI mapping support with nested
 SMMU

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@...dia.com>
> Sent: Saturday, January 11, 2025 3:32 AM
> To: will@...nel.org; robin.murphy@....com; jgg@...dia.com;
> kevin.tian@...el.com; tglx@...utronix.de; maz@...nel.org;
> alex.williamson@...hat.com
> Cc: joro@...tes.org; shuah@...nel.org; reinette.chatre@...el.com;
> eric.auger@...hat.com; yebin (H) <yebin10@...wei.com>;
> apatel@...tanamicro.com; shivamurthy.shastri@...utronix.de;
> bhelgaas@...gle.com; anna-maria@...utronix.de; yury.norov@...il.com;
> nipun.gupta@....com; iommu@...ts.linux.dev; linux-
> kernel@...r.kernel.org; linux-arm-kernel@...ts.infradead.org;
> kvm@...r.kernel.org; linux-kselftest@...r.kernel.org;
> patches@...ts.linux.dev; jean-philippe@...aro.org; mdf@...nel.org;
> mshavit@...gle.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@...wei.com>; smostafa@...gle.com;
> ddutile@...hat.com
> Subject: [PATCH RFCv2 00/13] iommu: Add MSI mapping support with
> nested SMMU
> 
> [ Background ]
> On ARM GIC systems and others, the target address of the MSI is translated
> by the IOMMU. For GIC, the MSI address page is called "ITS" page. When
> the
> IOMMU is disabled, the MSI address is programmed to the physical location
> of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> page is behind the IOMMU, so the MSI address is programmed to an
> allocated
> IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> When a 2-stage translation is enabled, IOVA will be still used to program
> the MSI address, though the mappings will be in two stages:
>   IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> PA (0x20200000)
> (IPA stands for Intermediate Physical Address).
> 
> If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA,
> the
> IOVA is dynamically allocated from the top of the IOVA space. If attached
> to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the
> IOVA is
> fixed to an MSI window reported by the IOMMU driver via
> IOMMU_RESV_SW_MSI,
> which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM
> IOMMUs.
> 
> So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> page is fixed and known by kernel. However, with virtual machine enabling
> a nested IOMMU translation (2-stage), a guest kernel directly controls the
> stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at
> an
> IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> kernel can't know that guest-level IOVA to program the MSI address.
> 
> There have been two approaches to solve this problem:
> 1. Create an identity mapping in the stage-1. VMM could insert a few RMRs
>    (Reserved Memory Regions) in guest's IORT. Then the guest kernel would
>    fetch these RMR entries from the IORT and create an
> IOMMU_RESV_DIRECT
>    region per iommu group for a direct mapping. Eventually, the mappings
>    would look like: IOVA (0x8000000) === IPA (0x8000000) ===> 0x20200000
>    This requires an IOMMUFD ioctl for kernel and VMM to agree on the IPA.
> 2. Forward the guest-level MSI IOVA captured by VMM to the host-level GIC
>    driver, to program the correct MSI IOVA. Forward the VMM-defined vITS
>    page location (IPA) to the kernel for the stage-2 mapping. Eventually:
>    IOVA (0xFFFF0000) ===> IPA (0x80900000) ===> PA (0x20200000)
>    This requires a VFIO ioctl (for IOVA) and an IOMMUFD ioctl (for IPA).
> 
> Worth mentioning that when Eric Auger was working on the same topic
> with
> the VFIO iommu uAPI, he had the approach (2) first, and then switched to
> the approach (1), suggested by Jean-Philippe for reduction of complexity.
> 
> The approach (1) basically feels like the existing VFIO passthrough that
> has a 1-stage mapping for the unmanaged domain, yet only by shifting the
> MSI mapping from stage 1 (guest-has-no-iommu case) to stage 2 (guest-has-
> iommu case). So, it could reuse the existing IOMMU_RESV_SW_MSI piece,
> by
> sharing the same idea of "VMM leaving everything to the kernel".
> 
> The approach (2) is an ideal solution, yet it requires additional effort
> for kernel to be aware of the 1-stage gIOVA(s) and 2-stage IPAs for vITS
> page(s), which demands VMM to closely cooperate.
>  * It also brings some complicated use cases to the table where the host
>    or/and guest system(s) has/have multiple ITS pages.

I had done some basic sanity tests with this series and the Qemu branches you
provided on a HiSilicon hardwrae. The basic dev assignment works fine. I will 
rebase my Qemu smuv3-accel branch on top of this and will do some more tests.

One confusion I have about the above text is, do we still plan to support the
approach -1( Using RMR in Qemu) or you are just mentioning it here because
it is still possible to make use of that. I think from previous discussions the 
argument was to adopt a more dedicated MSI pass-through model which I
think is  approach-2 here.  Could you please confirm.

Thanks,
Shameer




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ