linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH92EddcAi6YgfT+Z0LjduRm7=sG-xWwdSudUCt18i=VSw@mail.gmail.com>
Date: Tue, 1 Jul 2025 07:02:52 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, 
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>, 
	"Li, Xiaoyao" <xiaoyao.li@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, 
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>, 
	"vbabka@...e.cz" <vbabka@...e.cz>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, 
	"michael.roth@....com" <michael.roth@....com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>, 
	"Peng, Chao P" <chao.p.peng@...el.com>, "Du, Fan" <fan.du@...el.com>, 
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Weiny, Ira" <ira.weiny@...el.com>, 
	"Li, Zhiquan1" <zhiquan1.li@...el.com>, "jroedel@...e.de" <jroedel@...e.de>, 
	"Miao, Jun" <jun.miao@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>, 
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

On Tue, Jul 1, 2025 at 6:32 AM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> On Tue, Jul 1, 2025 at 2:38 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > On Tue, Jul 01, 2025 at 01:55:43AM +0800, Edgecombe, Rick P wrote:
> > > So for this we can do something similar. Have the arch/x86 side of TDX grow a
> > > new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of
> > > SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs after
> > > that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the system
> > > die. Zap/cleanup paths return success in the buggy shutdown case.
> > All TDs in the system die could be too severe for unmap errors due to KVM bugs.
>
> At this point, I don't see a way to quantify how bad a KVM bug can get
> unless you have explicit ideas about the severity. We should work on
> minimizing KVM side bugs too and assuming it would be a rare
> occurrence I think it's ok to take this intrusive measure.
>
> >
> > > Does it fit? Or, can you guys argue that the failures here are actually non-
> > > special cases that are worth more complex recovery? I remember we talked about
> > > IOMMU patterns that are similar, but it seems like the remaining cases under
> > > discussion are about TDX bugs.
> > I didn't mention TDX connect previously to avoid introducing unnecessary
> > complexity.
> >
> > For TDX connect, S-EPT is used for private mappings in IOMMU. Unmap could
> > therefore fail due to pages being pinned for DMA.
>
> We are discussing this scenario already[1], where the host will not
> pin the pages used by secure DMA for the same reasons why we can't
> have KVM pin the guest_memfd pages mapped in SEPT. Is there some other
> kind of pinning you are referring to?
>
> If there is an ordering in which pages should be unmapped e.g. first
> in secure IOMMU and then KVM SEPT, then we can ensure the right
> ordering between invalidation callbacks from guest_memfd.
>
> [1] https://lore.kernel.org/lkml/CAGtprH_qh8sEY3s-JucW3n1Wvoq7jdVZDDokvG5HzPf0HV2=pg@mail.gmail.com/#t
>
> >
> > So, my thinking was that if that happens, KVM could set a special flag to folios
> > pinned for private DMA.
> >
> > Then guest_memfd could check the special flag before allowing private-to-shared
> > conversion, or punch hole.
> > guest_memfd could check this special flag and choose to poison or leak the
> > folio.
> >
> > Otherwise, if we choose tdx_buggy_shutdown() to "do an all-cpu IPI to kick CPUs
> > out of SEAMMODE, wbivnd, and set a "no more seamcalls" bool", DMAs may still
> > have access to the private pages mapped in S-EPT.
>
> guest_memfd will have to ensure that pages are unmapped from secure
> IOMMU pagetables before allowing them to be used by the host.
>
> If secure IOMMU pagetables unmapping fails, I would assume it fails in
> the similar category of rare "KVM/TDX module/IOMMUFD" bug and I think
> it makes sense to do the same tdx_buggy_shutdown() with such failures
> as well.

In addition we will need a way to fail all further Secure IOMMU table
walks or some way to stop the active secure DMA by unbinding all the
TDIs. Maybe such scenarios warrant a BUG_ON() if recovery is not
possible as possibly any or all of the KVM/IOMMUFD/TDX module can't be
trusted for reliable functionality anymore.