[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8338b47-6fbf-44ac-9b99-3555997c9f36@amd.com>
Date: Tue, 29 Apr 2025 11:04:06 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: jgg@...dia.com, kevin.tian@...el.com, corbet@....net, will@...nel.org,
bagasdotme@...il.com, robin.murphy@....com, joro@...tes.org,
thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
shuah@...nel.org, jsnitsel@...hat.com, nathan@...nel.org,
peterz@...radead.org, yi.l.liu@...el.com, mshavit@...gle.com,
praan@...gle.com, zhangzekun11@...wei.com, iommu@...ts.linux.dev,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org,
linux-kselftest@...r.kernel.org, patches@...ts.linux.dev, mochs@...dia.com,
alok.a.tiwari@...cle.com,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC
ioctl
Hi Nicolin,
On 4/29/2025 1:32 AM, Nicolin Chen wrote:
> On Mon, Apr 28, 2025 at 05:42:27PM +0530, Vasant Hegde wrote:
>>> +/**
>>> + * struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
>>> + * @size: sizeof(struct iommu_vcmdq_alloc)
>>> + * @flags: Must be 0
>>> + * @viommu_id: Virtual IOMMU ID to associate the virtual command queue with
>>> + * @type: One of enum iommu_vcmdq_type
>>> + * @index: The logical index to the virtual command queue per virtual IOMMU, for
>>> + * a multi-queue model
>>> + * @out_vcmdq_id: The ID of the new virtual command queue
>>> + * @addr: Base address of the queue memory in the guest physical address space
>>
>> Sorry. I didn't get this part.
>>
>> So here `addr` is command queue base address like
>> - NVIDIA's virtual command queue
>> - AMD vIOMMU's command buffer
>>
>> .. and it will allocate vcmdq for each buffer type. Is that the correct
>> understanding?
>
> Yes. For AMD "vIOMMU", it needs a new type for iommufd vIOMMU:
> IOMMU_VIOMMU_TYPE_AMD_VIOMMU,
>
> For AMD "vIOMMU" command buffer, it needs a new type too:
> IOMMU_VCMDQ_TYPE_AMD_VIOMMU, /* Kdoc it to be Command Buffer */
You are suggesting we define one type for AMD and use it for all buffers like
command buffer, event log, PPR buffet etc? and use iommu_vcmdq_alloc->index to
identity different buffer type?
>
> Then, use IOMMUFD_CMD_VIOMMU_ALLOC ioctl to allocate an vIOMMU
> obj, and use IOMMUFD_CMD_VCMDQ_ALLOC ioctl(s) to allocate vCMDQ
> objs.
>
>> In case of AMD vIOMMU, buffer base address is programmed in different register
>> (ex: MMIO Offset 0008h Command Buffer Base Address Register) and buffer
>> enable/disable is done via different register (ex: MMIO Offset 0018h IOMMU
>> Control Register). And we need to communicate both to hypervisor. Not sure this
>> API can accommodate this as addr seems to be mandatory.
>
> NVIDIA's CMDQV has all three of them too. What we do here is to
> let VMM trap the buffer base address (in guest physical address
> space) and forward it to kernel using this @addr. Then, kernel
> will translate this @addr to host physical address space, and
> program the physical address and size to the register.
Right. For AMD IOMMU 1st 4K of MMIO space (which contains all buffer base
address registers) is not accelerated. So we can trap it and pass GPA, size to
iommufd.
.. but programming base register (like Command buffer base addr) is not
sufficient. We have to enable the command buffer by setting particular bit in
Control register. So at high level flow is something like below (@Suravee,
correct me if I missed something here).
>From guest side :
Write command bufer base addr, size (MMIO offset 0x08)
Set MMIO Offset 0x18[bit 12]
Also we need to program few other bits that are not related to these buffers
like `Completion wait interrupt enable`.
>From VMM side:
We need to trap both register and pass it to iommufd
>From Host AMD IOMMU driver:
We have to program VFCntlMMIO Offset {16’b[GuestID], 6’b10_0000}
We need a way to pass Control register details to iommufd -> AMD driver so that
we can program the VF control MMIO register.
Since iommu_vcmdq_alloc structure doesn't have user_data, how do we communicate
control register?
>
>> [1]
>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf
>
> Thanks for the doc. So, AMD has:
>
> Command Buffer Base Address Register [MMIO Offset 0008h]
> "used to program the system physical base address and size of the
> command buffer. The command buffer occupies contiguous physical
> memory starting at the programmed base address, up to the
> programmed size."
> Command Buffer Head Pointer Register [MMIO Offset 2000h]
> Command Buffer Tail Pointer Register [MMIO Offset 2008h]
>
> IIUIC, AMD should do the same: VMM traps VM's Command Buffer Base
> Address register when the guest kernel allocates a command buffer
> by programming the VM's Command Buffer Base Address register, to
> capture the guest PA and size. Then, VMM allocates a vCMDQ object
> (for this command buffer) forwarding its buffer address and size
> via @addr and @length to the host kernel. Then, the kernel should
> translate the guest PA to host PA to program the HW.
>
> We can see that the Head/Tail registers are in a different MMIO
> page (offset by two 4K pages), which is very like NVIDIA CMDQV
> that allows VMM to mmap that MMIO page of the Head/Tail registers
> for guest OS to directly control the HW (i.e. VMM doesn't trap
> these two registers.
>
> When guest OS wants to issue a new command, the guest kernel can
> just fill the guest command buffer at the entry that the Head
> register points to, and program the Tail register (backed by an
> mmap'd MMIO page), then the HW will read the programmed physical
> address from the entry (Head) till the entry (Tail) in the guest
> command buffer.
Right.
>
>>> @@ -170,3 +170,97 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
>>> iommufd_put_object(ucmd->ictx, &viommu->obj);
>>> return rc;
>>> }
>>> +
>>> +void iommufd_vcmdq_destroy(struct iommufd_object *obj)
>>> +{
>>
>> I didn't understood destroy flow in general. Can you please help me to understand:
>>
>> VMM is expected to track all buffers and call this interface? OR iommufd will
>> take care of it? What happens if VM crashes ?
>
> In a normal routine, VMM gets a vCMDQ object ID for each vCMDQ
> object it allocated. So, it should track all the IDs and release
> them when VM shuts down.
>
> The iommufd core does track all the objects that belong to an
> iommufd context (ictx), and automatically release them. But, it
> can't resolve certain dependency on other FD, e.g. vEVENTQ and
> FAULT QUEUE would return another FD that user space listens to
> and must be closed properly to destroy the QUEUE object.
Got it.
>
>>> + /* The underlying physical pages must be pinned in the IOAS */
>>> + rc = iopt_pin_pages(&viommu->hwpt->ioas->iopt, cmd->addr, cmd->length,
>>> + pages, 0);
>>
>> Why do we need this? is it not pinned already as part of vfio binding?
>
> I think this could be clearer:
> /*
> * The underlying physical pages must be pinned to prevent them from
> * being unmapped (via IOMMUFD_CMD_IOAS_UNMAP) during the life cycle
> * of the vCMDQ object.
> */
Understood.
Thanks
-Vasant
Powered by blists - more mailing lists