lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Jun 2023 18:15:17 -0700
From:   "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
        kvm@...r.kernel.org, joro@...tes.org, robin.murphy@....com,
        yi.l.liu@...el.com, alex.williamson@...hat.com,
        nicolinc@...dia.com, baolu.lu@...ux.intel.com,
        eric.auger@...hat.com, pandoh@...gle.com, kumaranand@...gle.com,
        jon.grimm@....com, santosh.shukla@....com, vasant.hegde@....com,
        jay.chen@....com, joseph.chung@....com
Subject: Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated
 vIOMMU w/ nested page table

Jason,

On 6/22/2023 6:46 AM, Jason Gunthorpe wrote:
> On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:
> 
>> Since the IOMMU hardware virtualizes the guest command buffer, this allows
>> IOMMU operations to be accelerated such as invalidation of guest pages
>> (i.e. stage1) when the command is issued by the guest kernel without
>> intervention from the hypervisor.
> 
> This is similar to what we are doing on ARM as well.

Ok

>> This series is implemented on top of the IOMMUFD framework. It leverages
>> the exisiting APIs and ioctls for providing guest iommu information
>> (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
>> table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
>> domain.
>>
>> Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
>>
>> NOTES
>> -----
>> This series is organized into two parts:
>>    * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
>>
>>    * Part2: Introducing HW-vIOMMU support (Patch 9-21).
>>
>>    * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
>>      additional opterations, which can be categorized into:
>>      - Ioctls to init/destroy AMD HW-vIOMMU instance
>>      - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
>>      - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
>>      - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
>>      - Ioctls to trap AMD HW-vIOMMU command buffer initialization.
> 
> No one else seems to need this kind of stuff, why is AMD different?
> 
> Emulation and mediation to create the vIOMMU is supposed to be in the
> VMM side, not in the kernel. I don't want to see different models by
> vendor.

These ioctl is not necessary for emulation, which I would agree that it 
should be done on the VMM side (e.g. QEMU). These ioctls provides 
necessary information for programming the AMD IOMMU hardware to provide 
hardware-assisted virtualized IOMMU. This includes programing certain 
data structures i.e. Domain ID mapping table (DomIDMap), Device ID 
mapping table (DevIDMap), and certain MMIO registers for controlling the 
HW-vIOMMU feature.

> Even stuff like setting up the gcr3 should not be it's own ioctls,
> that is now how we are modeling things at all.

Sorry for miscommunication regarding the ioctl for setting up gcr3 in 
the commit log message for patch 20 and causing confusion. I'll update 
the message accordingly. Please allow me to clarify this briefly here.

In this series, AMD IOMMU GCR3 table is actually setup when the 
IOMMUFD_CMD_HWPT_ALLOC is called, which the driver provides a hook to 
struct iommu_ops.domain_alloc_user(). The AMD-specific information is 
communicated from QEMU via iommu_domain_user_data.iommu_hwpt_amd_v2. 
This is similar to INTEL and ARM.

Please also note that for the AMD HW-vIOMMU device model in QEMU, the 
guest memory used for IOMMU device table is trapped on when guest IOMMU 
driver programs the guest Device Table Entry (gDTE). Then QEMU reads the 
content of gDTE to extract necessary information for setting up guest 
(stage-1) page table, and calls iommufd_backend_alloc_hwpt().

There are still work to be done in this to fully support PASID. I'll 
take a look at this next.

> I think you need to take smaller steps in line with the other
> drivers so we can all progress through this step by step together.

I can certainly breakdown the patch series in to smaller parts to align 
with the rest.

> To start focus only on user space page tables and kernel mediated
> invalidation and fit into the same model as everyone else. This is
> approx the same patches and uAPI you see for ARM and Intel. AFAICT
> AMD's HW is very similar to ARM's, so you should be aligning to the
> ARM design.

I think the user space page table is covered as described above.

As for the kernel mediated invalidation, IIUC from looking at the patches:

* iommufd: Add nesting related data structures for ARM SMMUv3
(https://github.com/yiliu1765/iommufd/commit/b6a5c8991dcc96ca895b53175c93e5fc522f42fe)

* iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
(https://github.com/yiliu1765/iommufd/commit/0ae59149474ad0cb8a42ff7e71ed6b4e9df00204) 


it seems that user-space is supposed to call the ioctl 
IOMMUFD_CMD_HWPT_INVALIDATE for both INTEL and ARM to issue invalidation 
for stage 1 page table. Please lemme know if I misunderstand the purpose 
of this ioctl.

However, for AMD since the HW-vIOMMU virtualizes the guest command 
buffer, and when it sees the page table invalidation command in the 
guest command buffer, it takes care of the invalidation using 
information in the DomIDMap, which maps guest domain ID (gDomID) of a 
particular guest to the corresponding host domain ID (hDomID) of the 
device and invalidate the nested translation according to the specified 
PASID, DomID, and GVA.

The DomIDMap is setup by the host IOMMU driver during VM initialization. 
When the guest IOMMU driver attaches the VFIO pass-through device to a 
guest iommu_group (i.e domain), it programs the gDTE with the gDomID. 
This action is trapped into QEMU and the gDomID is read from the gDTE 
and communicated to hypervisor via the newly proposed ioctl 
VIOMMU_DOMAIN_ATTACH. Now the DomIDMap is created for the VFIO device.

> Then maybe we can argue if a kernel vIOMMU emulation/mediation is
> appropriate or not, but this series is just too much as is.

Sure, we can continue to discuss the implementation detail for each part 
separately.

> I also want to see the AMD driver align with the new APIs for
> PASID/etc before we start shovling more stuff into it. 

Are you referring to the IOMMU API for SVA/PASID stuff:
   * struct iommu_domain_ops.set_dev_pasid()
   * struct iommu_ops.remove_dev_pasid()
   * ...

If so, we are working on it separately in parallel, and will be sending 
out RFC soon.

Otherwise, could you please point me what "new APIs for PASID/etc" you 
are referring to in particular? I might have missed something here.

> This is going to be part of the iommufd contract as well, I'm very unhappy to see
> drivers pick and choosing what part of the contract they implement.

Sorry, didn't mean to disappoint :) Lemme look into this part more and 
will try to be more compliance with the contract in the next RFC.

Thanks,
Suravee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ