lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f51e62543aa765da3b4f4ed19aa13340881fbc89.camel@intel.com>
Date: Thu, 26 Jun 2025 22:20:53 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "Hunter, Adrian" <adrian.hunter@...el.com>,
	"Annapurve, Vishal" <vannapurve@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "tony.lindgren@...ux.intel.com"
	<tony.lindgren@...ux.intel.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "linux-edac@...r.kernel.org"
	<linux-edac@...r.kernel.org>, "hpa@...or.com" <hpa@...or.com>, "Chatre,
 Reinette" <reinette.chatre@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"bp@...en8.de" <bp@...en8.de>, "Gao, Chao" <chao.gao@...el.com>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages

On Thu, 2025-06-26 at 15:31 +0000, Luck, Tony wrote:
> > However if the kernel touch the page again using MOVDIR64B, the further #MC
> > won't have MCG_STATUS_SEAM_NR bit set (because it doesn't happen in SEAM
> > non-root), therefore it will be treated as a normal kernel #MC which will
> > result in kernel panic.
> 
> Intel CPUs signal #MC when an instruction that is trying to consume poison data
> is about to retire.
> 
> Stores to memory do not consume poison, so will not signal a #MC.
> 
> In the MOVDIR64B case an entire cache line is stored in a single atomic
> write. This will clear the poison state of the cacheline (assuming that the
> poison was due to an integrity error, memory error injection, I/O error etc.
> If the DIMM is bad and has stuck bits, then the memory may still fail ECC
> check on the next read.)
> 
> Using smaller stores to overwrite the cache line will not clear poison. The
> cacheline is read from memory to some cache level, the small store updates
> some bytes in the line, but the poison flag remains. Note that this doesn't
> trigger #MC because the poison data is not being consumed, it still isn't
> architecturally visible in some register, memory, or I/O device.
> 
> You may still see a UCNA signature signaled with CMCI from the memory
> controller if either case resulted in a speculative prefetch of the poisoned
> cache line.
> 
> -Tony

Thanks for the info.  :-)

So it seems MOVDIR64B to a bad memory won't necessarily trigger #MC when the
written is performed.

But IMHO we may should just have a simple policy that when a page is marked
as poisoned, it should never be touched again.  It's only one page anyway
(for one TD) so losing that doesn't seem bad to me.  If we want to clear the
poisoned page, then perhaps we should mark that page to be not-poisoned
again.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ