[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <391ab316-79b1-4535-a45b-4c01bfb80de6@amd.com>
Date: Tue, 12 Dec 2023 00:35:26 +0700
From: "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
To: Jason Gunthorpe <jgg@...dia.com>, Yi Liu <yi.l.liu@...el.com>,
"Giani, Dhaval" <Dhaval.Giani@....com>,
Vasant Hegde <vasant.hegde@....com>
Cc: joro@...tes.org, alex.williamson@...hat.com, kevin.tian@...el.com,
robin.murphy@....com, baolu.lu@...ux.intel.com, cohuck@...hat.com,
eric.auger@...hat.com, nicolinc@...dia.com, kvm@...r.kernel.org,
mjrosato@...ux.ibm.com, chao.p.peng@...ux.intel.com,
yi.y.sun@...ux.intel.com, peterx@...hat.com, jasowang@...hat.com,
shameerali.kolothum.thodi@...wei.com, lulu@...hat.com,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, zhenzhong.duan@...el.com,
joao.m.martins@...cle.com, xin.zeng@...el.com, yan.y.zhao@...el.com
Subject: Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)
On 12/9/2023 8:47 AM, Jason Gunthorpe wrote:
> On Fri, Nov 17, 2023 at 05:07:11AM -0800, Yi Liu wrote:
>
>> Take Intel VT-d as an example, the stage-1 translation table is I/O page
>> table. As the below diagram shows, guest I/O page table pointer in GPA
>> (guest physical address) is passed to host and be used to perform the stage-1
>> address translation. Along with it, modifications to present mappings in the
>> guest I/O page table should be followed with an IOTLB invalidation.
>
> I've been looking at what the three HW's need for invalidation, it is
> a bit messy.. Here is my thinking. Please let me know if I got it right
>
> What is the starting point of the guest memory walks:
> Intel: Single Scalable Mode PASID table entry indexed by a RID & PASID
> AMD: GCR3 table (a table of PASIDs) indexed by RID
GCR3 table is indexed by PASID.
Device Table (DTE) is indexted by DeviceID (RID)
> ...
> Will ATC be forwarded or synthesized:
> Intel: The (vDomain-ID,PASID) is a unique nesting domain so
> the hypervisor knows exactly which RIDs this nesting domain is
> linked to and can generate an ATC invalidation. Plan is to
> supress/discard the ATC invalidations from the VM and generate
> them in the hypervisor.
> AMD: (vDomain-ID,PASID) is ambiguous, it can refer to multiple GCR3
> tables. We know which maximal set of RIDs it represents, but not
> the actual set. I expect AMD will forward the ATC invalidation
> to avoid over invalidation.
Not sure I understand your description here.
For the AMD IOMMU INVALIDE_IOMMU_PAGES (i.e. invalidate the IOMMU TLB),
the hypervisor needs to map gDomainId->hDomainId and issue the command
on behalf of the VM along with the PASID and GVA (or GVA range) provided
by the guest.
For the AMD IOMMU INVALIDE_IOTLB_PAGES (i.e. invalidate the ATC on the
device), the hypervisor needs to map gDeviceId->hDeviceId and issue the
command on behalf of the VM along with the PASID and GVA (or GVA range)
provided by the guest.
> ARM: ASID is ambiguous. We have no idea which Nesting Domain/CD table
> the ASID is contained in. ARM must forward the ATC invalidation
> from the guest.
>
> What iommufd object should receive the IOTLB invalidation command list:
> Intel: The Nesting domain. The command list has to be broken up per
> (vDomain-ID,PASID) and that batch delivered to the single
> nesting domain. Kernel ignores vDomain-ID/PASID and just
> invalidates whatever the nesting domain is actually attached to
> AMD: Any Nesting Domain in the vDomain-ID group. The command list has
> to be broken up per (vDomain-ID). Kernel replaces
> vDomain-ID with pDomain-ID from the nesting domain and executes
> the invalidation.
> ARM: The Nesting Parent domain. Kernel forces the VMID from the
> Nesting Parent and executes the invalidation.
>
> In all cases the VM issues an ATC invalidation with (vRID, PASID) as
> the tag. The VMM must translate vRID -> dev_id -> pRID
>
> For a pure SW flow the vRID can be mapped to the dev_id and the ATC
> invalidation delivered to the device object (eg IOMMUFD_DEV_INVALIDATE)
>
> Finally, we have the HW driven invalidation DMA queues that can be
> directly assigned to the guest. AMD and SMMUv3+vCMDQ support this. In
> this case the HW is directly processing invalidation commands without
> a hypervisor trap.
>
> To make this work the iommu needs to be programmed with:
> AMD: A vDomain-ID -> pDomain-ID table
> A vRID -> pRID table
> This is all bound to some "virtual function"
By "virtual function", I assume you are referring to the AMD vIOMMU
instance in the guest?
> ARM: A vRID -> pRID table
> The vCMDQ is bound to a VM_ID, so to the Nesting Parent
>
> For AMD, as above, I suggest the vDomain-ID be passed when creating
> the nesting domain
Sure, we can do this part.
> The AMD "virtual function".. It is probably best to create a new iommufd
> object for this and it can be passed in to a few places
Something like IOMMUFD_OBJ_VIOMMU? Then operation would include
something like:
* Init
* Destroy
* ...
> The vRID->pRID table should be some mostly common
> IOMMUFD_DEV_ASSIGN_VIRTUAL_ID. AMD will need to pass in the virtual
> function ID and ARM will need to pass in the Nesting Parent ID.
Ok.
> ...
> Thus next steps:
> - Respin this and lets focus on Intel only (this will be tough for
> the holidays, but if it is available I will try)
> - Get an ARM patch that just does IOTLB invalidation and add it to my
> part 3
> - Start working on IOMMUFD_DEV_INVALIDATE along with an ARM
> implementation of it
> - Reorganize the AMD RFC broadly along these lines and lets see it
> freshened up in the next months as well. I would like to see the
> AMD support structured to implement the SW paths in first steps and
> later add in the "virtual function" acceleration stuff. The latter
> is going to be complex.
Working on refining the part 1 to add HW info reporting and nested
translation (minus the invalidation stuff). Should be sending out soon.
Suravee
Powered by blists - more mailing lists