[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH86-HkfnTMmwdPsKgXxjTomvMWWAeCuZKSieb5o6MvRPQ@mail.gmail.com>
Date: Tue, 1 Jul 2025 06:32:38 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>,
"vbabka@...e.cz" <vbabka@...e.cz>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
"michael.roth@....com" <michael.roth@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
"Peng, Chao P" <chao.p.peng@...el.com>, "Du, Fan" <fan.du@...el.com>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Weiny, Ira" <ira.weiny@...el.com>,
"Li, Zhiquan1" <zhiquan1.li@...el.com>, "jroedel@...e.de" <jroedel@...e.de>,
"Miao, Jun" <jun.miao@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages
On Tue, Jul 1, 2025 at 2:38 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> On Tue, Jul 01, 2025 at 01:55:43AM +0800, Edgecombe, Rick P wrote:
> > So for this we can do something similar. Have the arch/x86 side of TDX grow a
> > new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of
> > SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs after
> > that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the system
> > die. Zap/cleanup paths return success in the buggy shutdown case.
> All TDs in the system die could be too severe for unmap errors due to KVM bugs.
At this point, I don't see a way to quantify how bad a KVM bug can get
unless you have explicit ideas about the severity. We should work on
minimizing KVM side bugs too and assuming it would be a rare
occurrence I think it's ok to take this intrusive measure.
>
> > Does it fit? Or, can you guys argue that the failures here are actually non-
> > special cases that are worth more complex recovery? I remember we talked about
> > IOMMU patterns that are similar, but it seems like the remaining cases under
> > discussion are about TDX bugs.
> I didn't mention TDX connect previously to avoid introducing unnecessary
> complexity.
>
> For TDX connect, S-EPT is used for private mappings in IOMMU. Unmap could
> therefore fail due to pages being pinned for DMA.
We are discussing this scenario already[1], where the host will not
pin the pages used by secure DMA for the same reasons why we can't
have KVM pin the guest_memfd pages mapped in SEPT. Is there some other
kind of pinning you are referring to?
If there is an ordering in which pages should be unmapped e.g. first
in secure IOMMU and then KVM SEPT, then we can ensure the right
ordering between invalidation callbacks from guest_memfd.
[1] https://lore.kernel.org/lkml/CAGtprH_qh8sEY3s-JucW3n1Wvoq7jdVZDDokvG5HzPf0HV2=pg@mail.gmail.com/#t
>
> So, my thinking was that if that happens, KVM could set a special flag to folios
> pinned for private DMA.
>
> Then guest_memfd could check the special flag before allowing private-to-shared
> conversion, or punch hole.
> guest_memfd could check this special flag and choose to poison or leak the
> folio.
>
> Otherwise, if we choose tdx_buggy_shutdown() to "do an all-cpu IPI to kick CPUs
> out of SEAMMODE, wbivnd, and set a "no more seamcalls" bool", DMAs may still
> have access to the private pages mapped in S-EPT.
guest_memfd will have to ensure that pages are unmapped from secure
IOMMU pagetables before allowing them to be used by the host.
If secure IOMMU pagetables unmapping fails, I would assume it fails in
the similar category of rare "KVM/TDX module/IOMMUFD" bug and I think
it makes sense to do the same tdx_buggy_shutdown() with such failures
as well.
Powered by blists - more mailing lists