linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <diqzcya13x2j.fsf@ackerleytng-ctop.c.googlers.com>
Date: Tue, 15 Jul 2025 15:31:48 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>
Cc: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, 
	"kirill.shutemov@...el.com" <kirill.shutemov@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>, 
	"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>, 
	"vbabka@...e.cz" <vbabka@...e.cz>, "Li, Zhiquan1" <zhiquan1.li@...el.com>, "Du, Fan" <fan.du@...el.com>, 
	"tabba@...gle.com" <tabba@...gle.com>, "seanjc@...gle.com" <seanjc@...gle.com>, "Weiny, Ira" <ira.weiny@...el.com>, 
	"Peng, Chao P" <chao.p.peng@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "michael.roth@....com" <michael.roth@....com>, 
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>, 
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pgonda@...gle.com" <pgonda@...gle.com>, 
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

"Edgecombe, Rick P" <rick.p.edgecombe@...el.com> writes:

> On Mon, 2025-07-14 at 12:49 -0700, Ackerley Tng wrote:
>> I'm onboard here. So "do nothing" means if there is a TDX unmap failure,
>> 
>> + KVM_BUG_ON() and hence the TD in question stops running,
>>     + No more conversions will be possible for this TD since the TD
>>       stops running.
>>     + Other TDs can continue running?
>> + No refcounts will be taken for the folio/page where the memory failure
>>   happened.
>> + No other indication (including HWpoison) anywhere in folio/page to
>>   indicate this happened.
>
> Yea.
>
>> + To round this topic up, do we do anything else as part of "do nothing"
>>   that I missed? Is there any record in the TDX module (TDX module
>>   itself, not within the kernel)?
>
> We should keep this as an option for how to change the TDX module to make this
> solution safer. For future arch things, we should maybe pursue something that
> works for TDX connect too, which could be more complicated.
>
>> 
>> I'll probably be okay with an answer like "won't know what will happen",
>
> I have not exhaustively looked at that there won't be cascading failures. I
> think it's reasonable given this is a bug case which we already have a way to
> catch with a warning.
>
>> but just checking - what might happen if this page that had an unmap
>> failure gets reused? 
>> 
>
> The TDX module has this thing called the PAMT which records how each physical
> page is in use. If KVM tries to re-add the page, the SEAMCALL will check PAMT,
> see it is not in the NDA (Not directly assigned) state, and give an error
> (TDX_OPERAND_PAGE_METADATA_INCORRECT). This is part of the security enforcement.
>
>> Suppose the KVM_BUG_ON() is noted but somehow we
>> couldn't get to the machine in time and the machine continues to serve,
>> and the memory is used by 
>> 
>> 1. Some other non-VM user, something else entirely, say a database?
>
> We are in a "there is a bug" state at this point, which means stability should
> not be expected to be as good. But it should be optimistically ok to re-use the
> page as long as the TD is not re-entered, or otherwise actuated via SEAMCALL.
>
>> 2. Some new non-TDX VM?
>
> Same as (1)
>
>> 3. Some new TD?
>
> As above, the TDX module should prevent this.

Thanks for clarifying! SGTM!

Btw, after some more work on handling memory failures for guest_memfd,
it now seems like it's better for guest_memfd to not use the HWpoison
flag internally either.

So it turns out well that for TDX unmap failures we're aligned on not
using HWpoison :)