linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH_Vj=KS0BmiX=P6nUTdYeAZhNEyjrRFXVK0sG=k4gbBMg@mail.gmail.com>
Date: Wed, 11 Jun 2025 07:30:10 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, pbonzini@...hat.com, seanjc@...gle.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kirill.shutemov@...el.com, 
	tabba@...gle.com, quic_eberman@...cinc.com, michael.roth@....com, 
	david@...hat.com, vbabka@...e.cz, jroedel@...e.de, thomas.lendacky@....com, 
	pgonda@...gle.com, zhiquan1.li@...el.com, fan.du@...el.com, 
	jun.miao@...el.com, ira.weiny@...el.com, isaku.yamahata@...el.com, 
	xiaoyao.li@...el.com, binbin.wu@...ux.intel.com, chao.p.peng@...el.com
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

On Wed, Jun 4, 2025 at 7:45 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> We need to restore to the previous status (which includes the host page table)
> if conversion can't be done.
> That said, in my view, a better flow would be:
>
> 1. guest_memfd sends a pre-invalidation request to users (users here means the
>    consumers in kernel of memory allocated from guest_memfd).
>
> 2. Users (A, B, ..., X) perform pre-checks to determine if invalidation can
>    proceed. For example, in the case of TDX, this might involve memory
>    allocation and page splitting.
>
> 3. Based on the pre-check results, guest_memfd either aborts the invalidation or
>    proceeds by sending the actual invalidation request.
>
> 4. Users (A-X) perform the actual unmap operation, ensuring it cannot fail. For
>    TDX, the unmap must succeed unless there are bugs in the KVM or TDX module.
>    In such cases, TDX can callback guest_memfd to inform the poison-status of
>    the page or elevate the page reference count.

Few questions here:
1) It sounds like the failure to remove entries from SEPT could only
be due to bugs in the KVM/TDX module, how reliable would it be to
continue executing TDX VMs on the host once such bugs are hit?
2) Is it reliable to continue executing the host kernel and other
normal VMs once such bugs are hit?
3) Can the memory be reclaimed reliably if the VM is marked as dead
and cleaned up right away?

>
> 5. guest_memfd completes the invalidation process. If the memory is marked as
>    "poison," guest_memfd can handle it accordingly. If the page has an elevated
>    reference count, guest_memfd may not need to take special action, as the
>    elevated count prevents the OS from reallocating the page.
>    (but from your reply below, seems a callback to guest_memfd is a better
>    approach).
>