lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <391ab316-79b1-4535-a45b-4c01bfb80de6@amd.com>
Date:   Tue, 12 Dec 2023 00:35:26 +0700
From:   "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
To:     Jason Gunthorpe <jgg@...dia.com>, Yi Liu <yi.l.liu@...el.com>,
        "Giani, Dhaval" <Dhaval.Giani@....com>,
        Vasant Hegde <vasant.hegde@....com>
Cc:     joro@...tes.org, alex.williamson@...hat.com, kevin.tian@...el.com,
        robin.murphy@....com, baolu.lu@...ux.intel.com, cohuck@...hat.com,
        eric.auger@...hat.com, nicolinc@...dia.com, kvm@...r.kernel.org,
        mjrosato@...ux.ibm.com, chao.p.peng@...ux.intel.com,
        yi.y.sun@...ux.intel.com, peterx@...hat.com, jasowang@...hat.com,
        shameerali.kolothum.thodi@...wei.com, lulu@...hat.com,
        iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, zhenzhong.duan@...el.com,
        joao.m.martins@...cle.com, xin.zeng@...el.com, yan.y.zhao@...el.com
Subject: Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)



On 12/9/2023 8:47 AM, Jason Gunthorpe wrote:
> On Fri, Nov 17, 2023 at 05:07:11AM -0800, Yi Liu wrote:
> 
>> Take Intel VT-d as an example, the stage-1 translation table is I/O page
>> table. As the below diagram shows, guest I/O page table pointer in GPA
>> (guest physical address) is passed to host and be used to perform the stage-1
>> address translation. Along with it, modifications to present mappings in the
>> guest I/O page table should be followed with an IOTLB invalidation.
> 
> I've been looking at what the three HW's need for invalidation, it is
> a bit messy.. Here is my thinking. Please let me know if I got it right
> 
> What is the starting point of the guest memory walks:
>   Intel: Single Scalable Mode PASID table entry indexed by a RID & PASID
>   AMD: GCR3 table (a table of PASIDs) indexed by RID

GCR3 table is indexed by PASID.
Device Table (DTE) is indexted by DeviceID (RID)

> ...
> Will ATC be forwarded or synthesized:
>   Intel: The (vDomain-ID,PASID) is a unique nesting domain so
>          the hypervisor knows exactly which RIDs this nesting domain is
> 	linked to and can generate an ATC invalidation. Plan is to
> 	supress/discard the ATC invalidations from the VM and generate
> 	them in the hypervisor.
>   AMD: (vDomain-ID,PASID) is ambiguous, it can refer to multiple GCR3
>        tables. We know which maximal set of RIDs it represents, but not
>        the actual set. I expect AMD will forward the ATC invalidation
>        to avoid over invalidation.

Not sure I understand your description here.

For the AMD IOMMU INVALIDE_IOMMU_PAGES (i.e. invalidate the IOMMU TLB), 
the hypervisor needs to map gDomainId->hDomainId and issue the command 
on behalf of the VM along with the PASID and GVA (or GVA range) provided 
by the guest.

For the AMD IOMMU INVALIDE_IOTLB_PAGES (i.e. invalidate the ATC on the 
device), the hypervisor needs to map gDeviceId->hDeviceId and issue the 
command on behalf of the VM along with the PASID and GVA (or GVA range) 
provided by the guest.

>   ARM: ASID is ambiguous. We have no idea which Nesting Domain/CD table
>        the ASID is contained in. ARM must forward the ATC invalidation
>        from the guest.
> 
> What iommufd object should receive the IOTLB invalidation command list:
>   Intel: The Nesting domain. The command list has to be broken up per
>          (vDomain-ID,PASID) and that batch delivered to the single
> 	nesting domain. Kernel ignores vDomain-ID/PASID and just
> 	invalidates whatever the nesting domain is actually attached to
>   AMD: Any Nesting Domain in the vDomain-ID group. The command list has
>        to be broken up per (vDomain-ID). Kernel replaces
>        vDomain-ID with pDomain-ID from the nesting domain and executes
>        the invalidation.
>   ARM: The Nesting Parent domain. Kernel forces the VMID from the
>        Nesting Parent and executes the invalidation.
> 
> In all cases the VM issues an ATC invalidation with (vRID, PASID) as
> the tag. The VMM must translate vRID -> dev_id -> pRID
> 
> For a pure SW flow the vRID can be mapped to the dev_id and the ATC
> invalidation delivered to the device object (eg IOMMUFD_DEV_INVALIDATE)
> 
> Finally, we have the HW driven invalidation DMA queues that can be
> directly assigned to the guest. AMD and SMMUv3+vCMDQ support this. In
> this case the HW is directly processing invalidation commands without
> a hypervisor trap.
> 
> To make this work the iommu needs to be programmed with:
>   AMD: A vDomain-ID -> pDomain-ID table
>        A vRID -> pRID table
>        This is all bound to some "virtual function"

By "virtual function", I assume you are referring to the AMD vIOMMU 
instance in the guest?

>   ARM: A vRID -> pRID table
>        The vCMDQ is bound to a VM_ID, so to the Nesting Parent
> 
> For AMD, as above, I suggest the vDomain-ID be passed when creating
> the nesting domain
Sure, we can do this part.

> The AMD "virtual function".. It is probably best to create a new iommufd
> object for this and it can be passed in to a few places

Something like IOMMUFD_OBJ_VIOMMU? Then operation would include 
something like:
   * Init
   * Destroy
   * ...

> The vRID->pRID table should be some mostly common
> IOMMUFD_DEV_ASSIGN_VIRTUAL_ID. AMD will need to pass in the virtual
> function ID and ARM will need to pass in the Nesting Parent ID.

Ok.

> ...
> Thus next steps:
>   - Respin this and lets focus on Intel only (this will be tough for
>     the holidays, but if it is available I will try)
>   - Get an ARM patch that just does IOTLB invalidation and add it to my
>     part 3
>   - Start working on IOMMUFD_DEV_INVALIDATE along with an ARM
>     implementation of it
>   - Reorganize the AMD RFC broadly along these lines and lets see it
>     freshened up in the next months as well. I would like to see the
>     AMD support structured to implement the SW paths in first steps and
>     later add in the "virtual function" acceleration stuff. The latter
>     is going to be complex.

Working on refining the part 1 to add HW info reporting and nested 
translation (minus the invalidation stuff). Should be sending out soon.

Suravee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ