[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFJGet5JS4ed7xfc@yilunxu-OptiPlex-7050>
Date: Wed, 18 Jun 2025 12:54:18 +0800
From: Xu Yilun <yilun.xu@...ux.intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>
Cc: kvm@...r.kernel.org, sumit.semwal@...aro.org, christian.koenig@....com,
pbonzini@...hat.com, seanjc@...gle.com, alex.williamson@...hat.com,
jgg@...dia.com, dan.j.williams@...el.com, aik@....com,
linux-coco@...ts.linux.dev, dri-devel@...ts.freedesktop.org,
linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
vivek.kasireddy@...el.com, yilun.xu@...el.com,
linux-kernel@...r.kernel.org, lukas@...ner.de, yan.y.zhao@...el.com,
daniel.vetter@...ll.ch, leon@...nel.org, baolu.lu@...ux.intel.com,
zhenzhong.duan@...el.com, tao1.su@...el.com,
linux-pci@...r.kernel.org, zhiw@...dia.com, simona.vetter@...ll.ch,
shameerali.kolothum.thodi@...wei.com, iommu@...ts.linux.dev,
kevin.tian@...el.com
Subject: Re: [RFC PATCH 19/30] vfio/pci: Add TSM TDI bind/unbind IOCTLs for
TEE-IO support
On Mon, Jun 16, 2025 at 01:46:42PM +0530, Aneesh Kumar K.V wrote:
> Xu Yilun <yilun.xu@...ux.intel.com> writes:
>
> > On Wed, Jun 04, 2025 at 07:07:18PM +0530, Aneesh Kumar K.V wrote:
> >> Xu Yilun <yilun.xu@...ux.intel.com> writes:
> >>
> >> > On Sun, Jun 01, 2025 at 04:15:32PM +0530, Aneesh Kumar K.V wrote:
> >> >> Xu Yilun <yilun.xu@...ux.intel.com> writes:
> >> >>
> >> >> > Add new IOCTLs to do TSM based TDI bind/unbind. These IOCTLs are
> >> >> > expected to be called by userspace when CoCo VM issues TDI bind/unbind
> >> >> > command to VMM. Specifically for TDX Connect, these commands are some
> >> >> > secure Hypervisor call named GHCI (Guest-Hypervisor Communication
> >> >> > Interface).
> >> >> >
> >> >> > The TSM TDI bind/unbind operations are expected to be initiated by a
> >> >> > running CoCo VM, which already have the legacy assigned device in place.
> >> >> > The TSM bind operation is to request VMM make all secure configurations
> >> >> > to support device work as a TDI, and then issue TDISP messages to move
> >> >> > the TDI to CONFIG_LOCKED or RUN state, waiting for guest's attestation.
> >> >> >
> >> >> > Do TSM Unbind before vfio_pci_core_disable(), otherwise will lead
> >> >> > device to TDISP ERROR state.
> >> >> >
> >> >>
> >> >> Any reason these need to be a vfio ioctl instead of iommufd ioctl?
> >> >> For ex: https://lore.kernel.org/all/20250529133757.462088-3-aneesh.kumar@kernel.org/
> >> >
> >> > A general reason is, the device driver - VFIO should be aware of the
> >> > bound state, and some operations break the bound state. VFIO should also
> >> > know some operations on bound may crash kernel because of platform TSM
> >> > firmware's enforcement. E.g. zapping MMIO, because private MMIO mapping
> >> > in secure page tables cannot be unmapped before TDI STOP [1].
> >> >
> >> > Specifically, for TDX Connect, the firmware enforces MMIO unmapping in
> >> > S-EPT would fail if TDI is bound. For AMD there seems also some
> >> > requirement about this but I need Alexey's confirmation.
> >> >
> >> > [1] https://lore.kernel.org/all/aDnXxk46kwrOcl0i@yilunxu-OptiPlex-7050/
> >> >
> >>
> >> According to the TDISP specification (Section 11.2.6), clearing either
> >> the Bus Master Enable (BME) or Memory Space Enable (MSE) bits will cause
> >> the TDI to transition to an error state. To handle this gracefully, it
> >> seems necessary to unbind the TDI before modifying the BME or MSE bits.
> >
> > Yes. But now the suggestion is never let VFIO do unbind, instead VFIO
> > should block these operations when device is bound.
> >
> >>
> >> If I understand correctly, we also need to unmap the Stage-2 mapping due
> >> to the issue described in commit
> >> abafbc551fddede3e0a08dee1dcde08fc0eb8476. Are there any additional
> >> reasons we would want to unmap the Stage-2 mapping for the BAR (as done
> >> in vfio_pci_zap_and_down_write_memory_lock)?
> >
> > I think no more reason.
> >
> >>
> >> Additionally, with TDX, it appears that before unmapping the Stage-2
> >> mapping for the BAR, we should first unbind the TDI (ie, move it to the
> >> "unlock" state?) Is this step related Section 11.2.6 of the TDISP spec,
> >> or is it driven by a different requirement?
> >
> > No, this is not device side TDISP requirement. It is host side
> > requirement to fix DMA silent drop issue. TDX enforces CPU S2 PT share
> > with IOMMU S2 PT (does ARM do the same?), so unmap CPU S2 PT in KVM equals
> > unmap IOMMU S2 PT.
> >
> > If we allow IOMMU S2 PT unmapped when TDI is running, host could fool
> > guest by just unmap some PT entry and suppress the fault event. Guest
> > thought a DMA writting is successful but it is not and may cause
> > data integrity issue.
> >
>
> I am still trying to find more details here. How did the guest conclude
> DMA writing is successful?
Traditionally VMM is the trusted entity. If there is no IOMMU fault
reported, guest assumes DMA writing is successful.
> Guest would timeout waiting for DMA to complete
There is no *generic* machanism to detect or wait for a single DMA
write completion. They are "posted" in terms of PCIe.
Thanks,
Yilun
> if the host hides the interrupt delivery of failed DMA transfer?
>
> >
> > This is not a TDX specific problem, but different vendors has different
> > mechanisms for this. For TDX, firmware fails the MMIO unmap for S2. For
> > AMD, will trigger some HW protection called "ASID fence" [1]. Not sure
> > how ARM handles this?
> >
> > https://lore.kernel.org/all/aDnXxk46kwrOcl0i@yilunxu-OptiPlex-7050/
> >
> > Thanks,
> > Yilun
> >
>
> -aneesh
Powered by blists - more mailing lists