[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b439abd6-9fd9-4f51-82e2-c8b1304e7cca@intel.com>
Date: Thu, 26 Jun 2025 15:33:11 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>, "Luck, Tony" <tony.luck@...el.com>,
"Hunter, Adrian" <adrian.hunter@...el.com>,
"Annapurve, Vishal" <vannapurve@...gle.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"mingo@...hat.com" <mingo@...hat.com>, "seanjc@...gle.com"
<seanjc@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"hpa@...or.com" <hpa@...or.com>, "Chatre, Reinette"
<reinette.chatre@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "bp@...en8.de"
<bp@...en8.de>, "Gao, Chao" <chao.gao@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages
On 6/26/25 15:20, Huang, Kai wrote:
> But IMHO we may should just have a simple policy that when a page is marked
> as poisoned, it should never be touched again. It's only one page anyway
> (for one TD) so losing that doesn't seem bad to me. If we want to clear the
> poisoned page, then perhaps we should mark that page to be not-poisoned
> again.
The simplest policy is to do nothing.
The kernel only has 29 places that check PageHWPoison(). I'd guess that
roughly half of those are the memory-failure.c infrastructure and
bare-minimum code to handle poison, like not allowing pages to go back
into the allocator.
There are something like 5,000 lines of code in the kernel that deal
with a literal 'struct page'. 29 checks for ~5,000 sites is pretty
minuscule. We obviously don't have a policy that every place that uses
'struct page' needs to check for poison. We also don't even have a
policy where writes to or reads from a page check for poison.
Why is this TDX code so special that PageHWPoison() needs to be checked.
For instance:
$ grep -r PageHWPoison arch/x86/
arch/x86/kernel/cpu/mce/core.c: SetPageHWPoison(p);
arch/x86/kernel/cpu/mce/core.c: SetPageHWPoison(p);
In other words, this would be the *ONLY* arch/x86 site. Why?
Powered by blists - more mailing lists