[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DS7PR11MB6077ED08B85A000014BDAE00FC7AA@DS7PR11MB6077.namprd11.prod.outlook.com>
Date: Thu, 26 Jun 2025 15:31:02 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>, "Hansen, Dave"
<dave.hansen@...el.com>, "Hunter, Adrian" <adrian.hunter@...el.com>,
"Annapurve, Vishal" <vannapurve@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Li, Xiaoyao"
<xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>, "Chatre,
Reinette" <reinette.chatre@...el.com>, "seanjc@...gle.com"
<seanjc@...gle.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "kirill.shutemov@...ux.intel.com"
<kirill.shutemov@...ux.intel.com>, "mingo@...hat.com" <mingo@...hat.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "hpa@...or.com"
<hpa@...or.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"bp@...en8.de" <bp@...en8.de>, "Gao, Chao" <chao.gao@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages
> However if the kernel touch the page again using MOVDIR64B, the further #MC
> won't have MCG_STATUS_SEAM_NR bit set (because it doesn't happen in SEAM
> non-root), therefore it will be treated as a normal kernel #MC which will
> result in kernel panic.
Intel CPUs signal #MC when an instruction that is trying to consume poison data
is about to retire.
Stores to memory do not consume poison, so will not signal a #MC.
In the MOVDIR64B case an entire cache line is stored in a single atomic
write. This will clear the poison state of the cacheline (assuming that the
poison was due to an integrity error, memory error injection, I/O error etc.
If the DIMM is bad and has stuck bits, then the memory may still fail ECC
check on the next read.)
Using smaller stores to overwrite the cache line will not clear poison. The
cacheline is read from memory to some cache level, the small store updates
some bytes in the line, but the poison flag remains. Note that this doesn't
trigger #MC because the poison data is not being consumed, it still isn't
architecturally visible in some register, memory, or I/O device.
You may still see a UCNA signature signaled with CMCI from the memory
controller if either case resulted in a speculative prefetch of the poisoned
cache line.
-Tony
Powered by blists - more mailing lists