linux-kernel - Re: [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA/exylmYJhIhEVL@Asurada-Nvidia>
Date: Mon, 28 Apr 2025 13:02:15 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Vasant Hegde <vasant.hegde@....com>
CC: <jgg@...dia.com>, <kevin.tian@...el.com>, <corbet@....net>,
	<will@...nel.org>, <bagasdotme@...il.com>, <robin.murphy@....com>,
	<joro@...tes.org>, <thierry.reding@...il.com>, <vdumpa@...dia.com>,
	<jonathanh@...dia.com>, <shuah@...nel.org>, <jsnitsel@...hat.com>,
	<nathan@...nel.org>, <peterz@...radead.org>, <yi.l.liu@...el.com>,
	<mshavit@...gle.com>, <praan@...gle.com>, <zhangzekun11@...wei.com>,
	<iommu@...ts.linux.dev>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-tegra@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
	<patches@...ts.linux.dev>, <mochs@...dia.com>, <alok.a.tiwari@...cle.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC
 ioctl

On Mon, Apr 28, 2025 at 05:42:27PM +0530, Vasant Hegde wrote:
> > +/**
> > + * struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
> > + * @size: sizeof(struct iommu_vcmdq_alloc)
> > + * @flags: Must be 0
> > + * @viommu_id: Virtual IOMMU ID to associate the virtual command queue with
> > + * @type: One of enum iommu_vcmdq_type
> > + * @index: The logical index to the virtual command queue per virtual IOMMU, for
> > + *         a multi-queue model
> > + * @out_vcmdq_id: The ID of the new virtual command queue
> > + * @addr: Base address of the queue memory in the guest physical address space
> 
> Sorry. I didn't get this part.
> 
> So here `addr` is command queue base address like
>  - NVIDIA's virtual command queue
>  - AMD vIOMMU's command buffer
> 
> .. and it will allocate vcmdq for each buffer type. Is that the correct
> understanding?

Yes. For AMD "vIOMMU", it needs a new type for iommufd vIOMMU:
	IOMMU_VIOMMU_TYPE_AMD_VIOMMU,

For AMD "vIOMMU" command buffer, it needs a new type too:
	IOMMU_VCMDQ_TYPE_AMD_VIOMMU, /* Kdoc it to be Command Buffer */

Then, use IOMMUFD_CMD_VIOMMU_ALLOC ioctl to allocate an vIOMMU
obj, and use IOMMUFD_CMD_VCMDQ_ALLOC ioctl(s) to allocate vCMDQ
objs.

> In case of AMD vIOMMU, buffer base address is programmed in different register
> (ex: MMIO Offset 0008h Command Buffer Base Address Register) and buffer
> enable/disable is done via different register (ex: MMIO Offset 0018h IOMMU
> Control Register). And we need to communicate both to hypervisor. Not sure this
> API can accommodate this as addr seems to be mandatory.

NVIDIA's CMDQV has all three of them too. What we do here is to
let VMM trap the buffer base address (in guest physical address
space) and forward it to kernel using this @addr. Then, kernel
will translate this @addr to host physical address space, and
program the physical address and size to the register.

> [1]
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf

Thanks for the doc. So, AMD has:

Command Buffer Base Address Register [MMIO Offset 0008h]
"used to program the system physical base address and size of the
 command buffer. The command buffer occupies contiguous physical
 memory starting at the programmed base address, up to the
 programmed size."
Command Buffer Head Pointer Register [MMIO Offset 2000h]
Command Buffer Tail Pointer Register [MMIO Offset 2008h]

IIUIC, AMD should do the same: VMM traps VM's Command Buffer Base
Address register when the guest kernel allocates a command buffer
by programming the VM's Command Buffer Base Address register, to
capture the guest PA and size. Then, VMM allocates a vCMDQ object
(for this command buffer) forwarding its buffer address and size
via @addr and @length to the host kernel. Then, the kernel should
translate the guest PA to host PA to program the HW.

We can see that the Head/Tail registers are in a different MMIO
page (offset by two 4K pages), which is very like NVIDIA CMDQV
that allows VMM to mmap that MMIO page of the Head/Tail registers
for guest OS to directly control the HW (i.e. VMM doesn't trap
these two registers.

When guest OS wants to issue a new command, the guest kernel can
just fill the guest command buffer at the entry that the Head
register points to, and program the Tail register (backed by an
mmap'd MMIO page), then the HW will read the programmed physical
address from the entry (Head) till the entry (Tail) in the guest
command buffer.

> > @@ -170,3 +170,97 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
> >  	iommufd_put_object(ucmd->ictx, &viommu->obj);
> >  	return rc;
> >  }
> > +
> > +void iommufd_vcmdq_destroy(struct iommufd_object *obj)
> > +{
> 
> I didn't understood destroy flow in general. Can you please help me to understand:
> 
> VMM is expected to track all buffers and call this interface?  OR iommufd will
> take care of it? What happens if VM crashes ?

In a normal routine, VMM gets a vCMDQ object ID for each vCMDQ
object it allocated. So, it should track all the IDs and release
them when VM shuts down.

The iommufd core does track all the objects that belong to an
iommufd context (ictx), and automatically release them. But, it
can't resolve certain dependency on other FD, e.g. vEVENTQ and
FAULT QUEUE would return another FD that user space listens to
and must be closed properly to destroy the QUEUE object.

> > +	/* The underlying physical pages must be pinned in the IOAS */
> > +	rc = iopt_pin_pages(&viommu->hwpt->ioas->iopt, cmd->addr, cmd->length,
> > +			    pages, 0);
> 
> Why do we need this? is it not pinned already as part of vfio binding?

I think this could be clearer:
	/*
	 * The underlying physical pages must be pinned to prevent them from
	 * being unmapped (via IOMMUFD_CMD_IOAS_UNMAP) during the life cycle
	 * of the vCMDQ object.
	 */

Thanks
Nicolin