linux-kernel - Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b439abd6-9fd9-4f51-82e2-c8b1304e7cca@intel.com>
Date: Thu, 26 Jun 2025 15:33:11 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>, "Luck, Tony" <tony.luck@...el.com>,
 "Hunter, Adrian" <adrian.hunter@...el.com>,
 "Annapurve, Vishal" <vannapurve@...gle.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
 "mingo@...hat.com" <mingo@...hat.com>, "seanjc@...gle.com"
 <seanjc@...gle.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
 "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
 "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
 "hpa@...or.com" <hpa@...or.com>, "Chatre, Reinette"
 <reinette.chatre@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "bp@...en8.de"
 <bp@...en8.de>, "Gao, Chao" <chao.gao@...el.com>,
 "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages

On 6/26/25 15:20, Huang, Kai wrote:
> But IMHO we may should just have a simple policy that when a page is marked
> as poisoned, it should never be touched again.  It's only one page anyway
> (for one TD) so losing that doesn't seem bad to me.  If we want to clear the
> poisoned page, then perhaps we should mark that page to be not-poisoned
> again.

The simplest policy is to do nothing.

The kernel only has 29 places that check PageHWPoison(). I'd guess that
roughly half of those are the memory-failure.c infrastructure and
bare-minimum code to handle poison, like not allowing pages to go back
into the allocator.

There are something like 5,000 lines of code in the kernel that deal
with a literal 'struct page'. 29 checks for ~5,000 sites is pretty
minuscule. We obviously don't have a policy that every place that uses
'struct page' needs to check for poison. We also don't even have a
policy where writes to or reads from a page check for poison.

Why is this TDX code so special that PageHWPoison() needs to be checked.
For instance:

$ grep -r PageHWPoison arch/x86/
arch/x86/kernel/cpu/mce/core.c:	SetPageHWPoison(p);
arch/x86/kernel/cpu/mce/core.c:	SetPageHWPoison(p);

In other words, this would be the *ONLY* arch/x86 site. Why?