linux-kernel - Re: [PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCt0/kOwCn8wZJG0@Asurada-Nvidia>
Date: Mon, 19 May 2025 11:14:22 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Vasant Hegde <vasant.hegde@....com>
CC: Jason Gunthorpe <jgg@...dia.com>, <kevin.tian@...el.com>,
	<corbet@....net>, <will@...nel.org>, <bagasdotme@...il.com>,
	<robin.murphy@....com>, <joro@...tes.org>, <thierry.reding@...il.com>,
	<vdumpa@...dia.com>, <jonathanh@...dia.com>, <shuah@...nel.org>,
	<jsnitsel@...hat.com>, <nathan@...nel.org>, <peterz@...radead.org>,
	<yi.l.liu@...el.com>, <mshavit@...gle.com>, <praan@...gle.com>,
	<zhangzekun11@...wei.com>, <iommu@...ts.linux.dev>,
	<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-tegra@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>, <patches@...ts.linux.dev>,
	<mochs@...dia.com>, <alok.a.tiwari@...cle.com>
Subject: Re: [PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC
 ioctl

On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
> Jason, Nicolin, Kevin,
> 
> 
> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
> > On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
> >> +/**
> >> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
> >> + * @size: sizeof(struct iommu_hw_queue_alloc)
> >> + * @flags: Must be 0
> >> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with
> >> + * @type: One of enum iommu_hw_queue_type
> >> + * @index: The logical index to the HW queue per virtual IOMMU for a multi-queue
> >> + *         model
> >> + * @out_hw_queue_id: The ID of the new HW queue
> >> + * @base_addr: Base address of the queue memory in guest physical address space
> >> + * @length: Length of the queue memory in the guest physical address space
> >> + *
> >> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, which
> >> + * allows HW to access a guest queue memory described by @base_addr and @length.
> >> + * Upon success, the underlying physical pages of the guest queue memory will be
> >> + * pinned to prevent VMM from unmapping them in the IOAS until the HW queue gets
> >> + * destroyed.
> > 
> > Do we have way to make the pinning optional?
> > 
> > As I understand AMD's system the iommu HW itself translates the
> > base_addr through the S2 page table automatically, so it doesn't need
> > pinned memory and physical addresses but just the IOVA.
> 
> Correct. HW will translate GPA -> SPA automatically using below information.
> 
> AMD IOMMU need special device ID to setup with  GPA -> SPA mapping per VM.
> and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID],
> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this
> address for GPA to SPA translation for buffers like command buffer.
> 
> So HW will use Base address (GPA), head/tail pointer to get the offset from
> Base. Then it will use GPA -> SPA translation.
> 
> 
> > 
> > Perhaps for this reason the pinning should be done with a function
> > call from the driver?
> 
> We still need to make sure memory allocated for page is present in memory so
> that IOMMU HW can access it.
> 
> Pinning at the time of guest boot is enough here -OR- do we need to increase
> reference in queue_alloc() path ?

For NVIDIA's vCMDQ that reads host PA directly, pages should be
pinned once when stage 2 mappings are created for the guest RAM,
and iommu_hw_queue_alloc() should pin the pages again to prevent
the gPA from being unmapped in the stage 2 page table. Otherwise
it will be a security hole, as HW continues to read the unmapped
memory through physical address space.

I understand that AMD Command Buffer also needs the S2 mappings
to be present in order to work correctly. But what happens if a
queue memory that isn't pinned (or even gets unmapped)? Will it
raise a translation fault v.s. HW reading the unmapped memory?

If so, I think this is Jason's point: there would be unlikely a
security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the
physical pages is likely optional.

Thanks
Nicolin