lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzqzyz4lqx.fsf@ackerleytng-ctop.c.googlers.com>
Date: Tue, 01 Jul 2025 14:57:58 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>
Cc: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen, Dave" <dave.hansen@...el.com>, 
	"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>, 
	"vbabka@...e.cz" <vbabka@...e.cz>, "tabba@...gle.com" <tabba@...gle.com>, 
	"Shutemov, Kirill" <kirill.shutemov@...el.com>, "michael.roth@....com" <michael.roth@....com>, 
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "seanjc@...gle.com" <seanjc@...gle.com>, 
	"Peng, Chao P" <chao.p.peng@...el.com>, "Du, Fan" <fan.du@...el.com>, 
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>, 
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>, 
	"Li, Zhiquan1" <zhiquan1.li@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>, 
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

Ackerley Tng <ackerleytng@...gle.com> writes:

> "Edgecombe, Rick P" <rick.p.edgecombe@...el.com> writes:
>
>> On Tue, 2025-07-01 at 13:01 +0800, Yan Zhao wrote:
>>> > Maybe Yan can clarify here. I thought the HWpoison scenario was about TDX
>>> > module
>>> My thinking is to set HWPoison to private pages whenever KVM_BUG_ON() was hit
>>> in
>>> TDX. i.e., when the page is still mapped in S-EPT but the TD is bugged on and
>>> about to tear down.
>>> 
>>> So, it could be due to KVM or TDX module bugs, which retries can't help.
>>
>> We were going to call back into guestmemfd for this, right? Not set it inside
>> KVM code.
>>
>
> Perhaps we had different understandings of f/g :P
>
> I meant that TDX module should directly set the HWpoison flag on the
> folio (HugeTLB or 4K, guest_memfd or not), not call into guest_memfd.
>

Sorry, correction here, not "TDX module" but the TDX part of KVM within
the kernel. Not the TDX module code itself. Sorry for the confusion.

> guest_memfd will then check this flag when necessary, specifically:
>
> * On faults, either into guest or host page tables 
> * When freeing the page
>     * guest_memfd will not return HugeTLB pages that are poisoned to
>       HugeTLB and just leak it
>     * 4K pages will be freed normally, because free_pages_prepare() will
>       check for HWpoison and skip freeing, from __folio_put() ->
>       free_frozen_pages() -> __free_frozen_pages() ->
>       free_pages_prepare()
> * I believe guest_memfd doesn't need to check HWpoison on conversions [1]
>
> [1] https://lore.kernel.org/all/diqz5xghjca4.fsf@ackerleytng-ctop.c.googlers.com/
>
>> What about a kvm_gmem_buggy_cleanup() instead of the system wide one. KVM calls
>> it and then proceeds to bug the TD only from the KVM side. It's not as safe for
>> the system, because who knows what a buggy TDX module could do. But TDX module
>> could also be buggy without the kernel catching wind of it.
>>
>> Having a single callback to basically bug the fd would solve the atomic context
>> issue. Then guestmemfd could dump the entire fd into memory_failure() instead of
>> returning the pages. And developers could respond by fixing the bug.
>>
>
> This could work too.
>
> I'm in favor of buying into the HWpoison system though, since we're
> quite sure this is fair use of HWpoison.
>
> Are you saying kvm_gmem_buggy_cleanup() will just set the HWpoison flag
> on the parts of the folios in trouble?
>
>> IMO maintainability needs to be balanced with efforts to minimize the fallout
>> from bugs. In the end a system that is too complex is going to have more bugs
>> anyway.
>>
>>> 
>>> > bugs. Not TDX busy errors, demote failures, etc. If there are "normal"
>>> > failures,
>>> > like the ones that can be fixed with retries, then I think HWPoison is not a
>>> > good option though.
>>> > 
>>> > >   there is a way to make 100%
>>> > > sure all memory becomes re-usable by the rest of the host, using
>>> > > tdx_buggy_shutdown(), wbinvd, etc?
>>> 
>>> Not sure about this approach. When TDX module is buggy and the page is still
>>> accessible to guest as private pages, even with no-more SEAMCALLs flag, is it
>>> safe enough for guest_memfd/hugetlb to re-assign the page to allow
>>> simultaneous
>>> access in shared memory with potential private access from TD or TDX module?
>>
>> With the no more seamcall's approach it should be safe (for the system). This is
>> essentially what we are doing for kexec.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ