linux-kernel - Re: [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3860aed-5b6b-4e68-a8fd-1a6ee28ba022@amd.com>
Date: Tue, 29 Apr 2025 15:52:48 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: jgg@...dia.com, kevin.tian@...el.com, corbet@....net, will@...nel.org,
 bagasdotme@...il.com, robin.murphy@....com, joro@...tes.org,
 thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
 shuah@...nel.org, jsnitsel@...hat.com, nathan@...nel.org,
 peterz@...radead.org, yi.l.liu@...el.com, mshavit@...gle.com,
 praan@...gle.com, zhangzekun11@...wei.com, iommu@...ts.linux.dev,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org,
 linux-kselftest@...r.kernel.org, patches@...ts.linux.dev, mochs@...dia.com,
 alok.a.tiwari@...cle.com,
 Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC
 ioctl

Hi Nicolin,


On 4/29/2025 12:15 PM, Nicolin Chen wrote:
> On Tue, Apr 29, 2025 at 11:04:06AM +0530, Vasant Hegde wrote:
>> On 4/29/2025 1:32 AM, Nicolin Chen wrote:
>>> On Mon, Apr 28, 2025 at 05:42:27PM +0530, Vasant Hegde wrote:
>>>>> +/**
>>>>> + * struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
>>>>> + * @size: sizeof(struct iommu_vcmdq_alloc)
>>>>> + * @flags: Must be 0
>>>>> + * @viommu_id: Virtual IOMMU ID to associate the virtual command queue with
>>>>> + * @type: One of enum iommu_vcmdq_type
>>>>> + * @index: The logical index to the virtual command queue per virtual IOMMU, for
>>>>> + *         a multi-queue model
>>>>> + * @out_vcmdq_id: The ID of the new virtual command queue
>>>>> + * @addr: Base address of the queue memory in the guest physical address space
>>>>
>>>> Sorry. I didn't get this part.
>>>>
>>>> So here `addr` is command queue base address like
>>>>  - NVIDIA's virtual command queue
>>>>  - AMD vIOMMU's command buffer
>>>>
>>>> .. and it will allocate vcmdq for each buffer type. Is that the correct
>>>> understanding?
>>>
>>> Yes. For AMD "vIOMMU", it needs a new type for iommufd vIOMMU:
>>> 	IOMMU_VIOMMU_TYPE_AMD_VIOMMU,
>>>
>>> For AMD "vIOMMU" command buffer, it needs a new type too:
>>> 	IOMMU_VCMDQ_TYPE_AMD_VIOMMU, /* Kdoc it to be Command Buffer */
>>
>> You are suggesting we define one type for AMD and use it for all buffers like
>> command buffer, event log, PPR buffet etc? and use iommu_vcmdq_alloc->index to
>> identity different buffer type?
> 
> We have vEVENTQ for event logging and FAULT_QUEUE for PRI, but both
> are not for hardware accelerated use cases.
> 
> I didn't check the details of AMD's event log and PPR buffers. But
> they seem to be the same ring buffers and can be consumed by guest
> kernel directly?

Right. Event log is accelerated and consumed by guest directly. Also we have
Event Log B !

> 
> Will the hardware replace the physical device ID in the event with
> the virtual device ID when injecting the event to a guest event/PPR
> queue? 
> If so, yea, I think you can define them separately using the> vCMDQ
infrastructures:
>  - IOMMU_VCMDQ_TYPE_AMD_VIOMMU_CMDBUF
>  - IOMMU_VCMDQ_TYPE_AMD_VIOMMU_EVENTLOG
>  - IOMMU_VCMDQ_TYPE_AMD_VIOMMU_PPRLOG
> (@Kevin @Jason Hmm, in this case we might want to revert the naming
>  "vCMDQ" back to "vQEUEUE", once Vasant confirms.)
> 
> Each of them will be allocated on top of one vIOMMU object.
> 
> As for index, it really depends on how vIOMMU manages those vCMDQ
> objects. In NVIDIA case, each VINTF can support multiple vCMDQs of
> the same type (like smp in CPU term). If AMD is a similar case, yea,
> apply an index to each of them is a good idea. Otherwise, if vIOMMU
> could only have three queues and their types are different, perhaps
> the driver-level vIOMMU structure could just hold three pointers to
> manage them without using the index.

Right. May be we can use index.


> 
>>> Then, use IOMMUFD_CMD_VIOMMU_ALLOC ioctl to allocate an vIOMMU
>>> obj, and use IOMMUFD_CMD_VCMDQ_ALLOC ioctl(s) to allocate vCMDQ
>>> objs.
>>>
>>>> In case of AMD vIOMMU, buffer base address is programmed in different register
>>>> (ex: MMIO Offset 0008h Command Buffer Base Address Register) and buffer
>>>> enable/disable is done via different register (ex: MMIO Offset 0018h IOMMU
>>>> Control Register). And we need to communicate both to hypervisor. Not sure this
>>>> API can accommodate this as addr seems to be mandatory.
>>>
>>> NVIDIA's CMDQV has all three of them too. What we do here is to
>>> let VMM trap the buffer base address (in guest physical address
>>> space) and forward it to kernel using this @addr. Then, kernel
>>> will translate this @addr to host physical address space, and
>>> program the physical address and size to the register.
>>
>> Right. For AMD IOMMU 1st 4K of MMIO space (which contains all buffer base
>> address registers) is not accelerated. So we can trap it and pass GPA, size to
>> iommufd.
> 
> Yes.
> 
>> .. but programming base register (like Command buffer base addr) is not
>> sufficient. We have to enable the command buffer by setting particular bit in
>> Control register.  So at high level flow is something like below (@Suravee,
>> correct me if I missed something here).
>>
>> From guest side :
>>   Write command bufer base addr, size (MMIO offset 0x08)
>>   Set MMIO Offset 0x18[bit 12]
>>     Also we need to program few other bits that are not related to these buffers
>>     like `Completion wait interrupt enable`.
>>
>> From VMM side:
>>   We need to trap both register and pass it to iommufd
>>
>> From Host AMD IOMMU driver:
>>   We have to program VFCntlMMIO Offset {16’b[GuestID], 6’b10_0000}
>>
>>
>> We need a way to pass Control register details to iommufd -> AMD driver so that
>> we can program the VF control MMIO register.
>>
>> Since iommu_vcmdq_alloc structure doesn't have user_data, how do we communicate
>> control register?
> 
> BIT(12) is the CMD enable bit. VMM can trap that as the trigger to
> forward the base address/length and other info.

Right. For the control bits which has corresponding buffer we can do that. But
there are few control register bits (like completion wait interrupt enable)
which doesn't have buffer.

> 
> And you'd likely need to define a driver structure:
> 
> // Add this to struct iommu_vcmdq_alloc;
> + * @data_len: Length of the type specific data
> + * @data_uptr: User pointer to the type specific data
> ..
> +	__u32 data_len;
> +	__aligned_u64 data_uptr;
> 
> // Refer to my patch in the v1 series that handles the user_data:
> // https://lore.kernel.org/linux-iommu/5cd2c7c4d92c79baf0cfc59e2a6b3e1db4e86ab8.1744353300.git.nicolinc@nvidia.com/

Right. I have seen V1 series and that thought we can define driver specific data
structure.

-Vasant

> 
> // Define a driver structure
> struct iommu_vcmdq_amd_viommu_cmdbuf {
> 	__u32 flags;
> 	__u32 data;
> };
> 
> Thanks
> Nicolin