[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SI2PR01MB4393C9F44A5A94D3DC9C28D4DCCEA@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>
Date: Mon, 10 Nov 2025 06:28:58 +0000
From: Wei Wang <wei.w.wang@...mail.com>
To: Tom Lendacky <thomas.lendacky@....com>, Jason Gunthorpe <jgg@...dia.com>
CC: "alex@...zbot.org" <alex@...zbot.org>, "suravee.suthikulpanit@....com"
<suravee.suthikulpanit@....com>, "joro@...tes.org" <joro@...tes.org>,
"kevin.tian@...el.com" <kevin.tian@...el.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "iommu@...ts.linux.dev"
<iommu@...ts.linux.dev>, Alexey Kardashevskiy <aik@....com>
Subject: RE: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for
MMIO-backed addresses
On Saturday, November 8, 2025 3:59 AM, Tom Lendacky wrote:
> On 11/7/25 12:32, Jason Gunthorpe wrote:
> > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:
> >
> >> When you are on bare-metal, or in the hypervisor, System Memory
> >> Encryption
> >> (SME) deals with the encryption bit set in the page table entries
> >> (including the nested page table entries for guests).
> >
> > So "decrypted" means something about AMD's unique memory encryption
> > scheme on bare metal but in a CC guest it is a cross arch 'shared with
> > hypervisor' flag?
>
> Note, that if the encryption bit is not set in the guest, then the host
> encryption key is used if the underlying NPT leaf entry has the encryption bit
> set. In that case, both the host and guest can read the memory, with the
> memory still being encrypted in physical memory.
>
> >
> > What about CXL memory? What about ZONE_DEVICE coherent memory?
> Do
> > these get the C bit set too?
>
> When CXL memory is presented as system memory to the OS it does support
> the encryption bit. So when pages are allocated for the guest, the memory
> pages will be encrypted with the guest key.
>
> Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented
> to the system as system physical memory that the hypervisor can allocate as
> guest memory?
>
> >
> > :( :( :(
> >
> >> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared)
> >> memory is used because a device cannot DMA to or from guest memory
> >> using the guest encryption key. So all DMA must go to "decrypted"
> >> memory or be bounce-buffered through "decrypted" memory (SWIOTLB)
> -
> >> basically memory that does not get encrypted/decrypted using the guest
> encryption key.
> >
> > Where is the code for this? As I wrote we always do sme_set in the
> > iommu driver, even on guests, even for "decrypted" bounce buffered
> > memory.
> >
> > That sounds like a bug by your explanation?
> >
> > Does this mean vIOMMU has never worked in AMD CC guests?
>
> I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This
> does does not work today with AMD CC guests since it requires the
> hypervisor to read the guest IOMMU buffers in order to emulate the
> behavior and those buffers are encrypted. So there is no vIOMMU support
> today in AMD CC guests.
>
> There was a patch series submitted a while back to allocate the IOMMU
> buffers in shared memory in order to support a (non-secure) vIOMMU in the
> guest in order to support >255 vCPUs, but that was rejected in favor of using
> kvm-msi-ext-dest-id.
>
> https://lore.kernel.org/linux-iommu/20240430152430.4245-1-
> suravee.suthikulpanit@....com/
>
> >
> >> It is not until we get to Trusted I/O / TDISP where devices will be
> >> able to DMA directly to guest encrypted memory and guests will
> >> require secure MMIO addresses which will need the encryption bit set
> >> (Alexey can correct me on the TIO statements if they aren't correct, as he
> is closer to it all).
> >
> > So in this case we do need to do sme_set on MMIO even though that
> MMIO
> > is not using the dram encryption key?
>
> @Alexey will be able to provide more details on how this works.
>
Also share my perspective on the proposed questions:
In the TEE-IO case, trusted device MMIO is mapped to a private Guest Physical
Address (FYI: this can be checked in the SEV-TIO whitepaper and Intel TDX
Connect architecture spec), that is, the C-bit is added to GPA, not to the host
physical address. So this case is not related to the updates introduced by this
patch, which handles the C-bit for MMIO physical addresses (via
iommu_v1_map_pages()).
Also the "encryption" bit (C-bit for AMD and S-bit for Intel) in the MMIO GPA
does not actually engage the encryption engine (e.g. SME) in the memory
controller for data encryption (communication with the trusted device is
encrypted via IDE). This bit is used for other non-encryption purposes.
For the ZONE_DEVICE memory (memory hosted on devices), IIUC, it is accessed
via PCIe (not through the on-die memory controller). So SME is not used to
encrypt this type of memory. If we pass through such a device to the guest using
VFIO type1, it will be treated as device MMIO that bypasses the C-bit setting added
by this patch, which I think is the expected behavior.
For the CXL memory, when it is used as the guest's system memory, it does not
go through the VFIO PCI BAR MMIO pass-through mechanism. So it also falls
outside the scope of the changes in this patch.
Powered by blists - more mailing lists