lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8bc605ad-54ad-4d3a-9547-caa9d15887f2@amd.com>
Date: Thu, 15 May 2025 20:29:16 +1000
From: Alexey Kardashevskiy <aik@....com>
To: Zhi Wang <zhiw@...dia.com>, Jason Gunthorpe <jgg@...dia.com>
Cc: Xu Yilun <yilun.xu@...ux.intel.com>, kvm@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, linux-media@...r.kernel.org,
 linaro-mm-sig@...ts.linaro.org, sumit.semwal@...aro.org,
 christian.koenig@....com, pbonzini@...hat.com, seanjc@...gle.com,
 alex.williamson@...hat.com, vivek.kasireddy@...el.com,
 dan.j.williams@...el.com, yilun.xu@...el.com, linux-coco@...ts.linux.dev,
 linux-kernel@...r.kernel.org, lukas@...ner.de, yan.y.zhao@...el.com,
 daniel.vetter@...ll.ch, leon@...nel.org, baolu.lu@...ux.intel.com,
 zhenzhong.duan@...el.com, tao1.su@...el.com
Subject: Re: [RFC PATCH 00/12] Private MMIO support for private assigned dev



On 13/5/25 20:03, Zhi Wang wrote:
> On Mon, 12 May 2025 11:06:17 -0300
> Jason Gunthorpe <jgg@...dia.com> wrote:
> 
>> On Mon, May 12, 2025 at 07:30:21PM +1000, Alexey Kardashevskiy wrote:
>>
>>>>> I'm surprised by this.. iommufd shouldn't be doing PCI stuff,
>>>>> it is just about managing the translation control of the device.
>>>>
>>>> I have a little difficulty to understand. Is TSM bind PCI stuff?
>>>> To me it is. Host sends PCI TDISP messages via PCI DOE to put the
>>>> device in TDISP LOCKED state, so that device behaves differently
>>>> from before. Then why put it in IOMMUFD?
>>>
>>>
>>> "TSM bind" sets up the CPU side of it, it binds a VM to a piece of
>>> IOMMU on the host CPU. The device does not know about the VM, it
>>> just enables/disables encryption by a request from the CPU (those
>>> start/stop interface commands). And IOMMUFD won't be doing DOE, the
>>> platform driver (such as AMD CCP) will. Nothing to do for VFIO here.
>>>
>>> We probably should notify VFIO about the state transition but I do
>>> not know VFIO would want to do in response.
>>
>> We have an awkward fit for what CCA people are doing to the various
>> Linux APIs. Looking somewhat maximally across all the arches a "bind"
>> for a CC vPCI device creation operation does:
>>
>>   - Setup the CPU page tables for the VM to have access to the MMIO
>>   - Revoke hypervisor access to the MMIO
>>   - Setup the vIOMMU to understand the vPCI device
>>   - Take over control of some of the IOVA translation, at least for
>> T=1, and route to the the vIOMMU
>>   - Register the vPCI with any attestation functions the VM might use
>>   - Do some DOE stuff to manage/validate TDSIP/etc
>>
>> So we have interactions of things controlled by PCI, KVM, VFIO, and
>> iommufd all mushed together.
>>
>> iommufd is the only area that already has a handle to all the required
>> objects:
>>   - The physical PCI function
>>   - The CC vIOMMU object
>>   - The KVM FD
>>   - The CC vPCI object
>>
>> Which is why I have been thinking it is the right place to manage
>> this.
>>
>> It doesn't mean that iommufd is suddenly doing PCI stuff, no, that
>> stays in VFIO.
>>
>>>>> So your issue is you need to shoot down the dmabuf during vPCI
>>>>> device destruction?
>>>>
>>>> I assume "vPCI device" refers to assigned device in both shared
>>>> mode & prvate mode. So no, I need to shoot down the dmabuf during
>>>> TSM unbind, a.k.a. when assigned device is converting from
>>>> private to shared. Then recover the dmabuf after TSM unbind. The
>>>> device could still work in VM in shared mode.
>>
>> What are you trying to protect with this? Is there some intelism where
>> you can't have references to encrypted MMIO pages?
>>
> 
> I think it is a matter of design choice. The encrypted MMIO page is
> related to the TDI context and secure second level translation table
> (S-EPT). and S-EPT is related to the confidential VM's context.
> 
> AMD and ARM have another level of HW control, together
> with a TSM-owned meta table, can simply mask out the access to those
> encrypted MMIO pages. Thus, the life cycle of the encrypted mappings in
> the second level translation table can be de-coupled from the TDI
> unbound. They can be reaped un-harmfully later by hypervisor in another
> path.
> 
> While on Intel platform, it doesn't have that additional level of
> HW control by design. Thus, the cleanup of encrypted MMIO page mapping
> in the S-EPT has to be coupled tightly with TDI context destruction in
> the TDI unbind process.
> 
> If the TDI unbind is triggered in VFIO/IOMMUFD, there has be a
> cross-module notification to KVM to do cleanup in the S-EPT.

QEMU should know about this unbind and can tell KVM about it too. No cross module notification needed, it is not a hot path.


> So shooting down the DMABUF object (encrypted MMIO page) means shooting
> down the S-EPT mapping and recovering the DMABUF object means
> re-construct the non-encrypted MMIO mapping in the EPT after the TDI is
> unbound.

This is definitely QEMU's job to re-mmap MMIO to the userspace (as it does for non-trusted devices today) so later on nested page fault could fill the nested PTE. Thanks,


> 
> Z.
> 
>>>> What I really want is, one SW component to manage MMIO dmabuf,
>>>> secure iommu & TSM bind/unbind. So easier coordinate these 3
>>>> operations cause these ops are interconnected according to secure
>>>> firmware's requirement.
>>>
>>> This SW component is QEMU. It knows about FLRs and other config
>>> space things, it can destroy all these IOMMUFD objects and talk to
>>> VFIO too, I've tried, so far it is looking easier to manage. Thanks,
>>
>> Yes, qemu should be sequencing this. The kernel only needs to enforce
>> any rules required to keep the system from crashing.
>>
>> Jason
>>
> 

-- 
Alexey


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ