linux-kernel - RE: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <DM8PR11MB575071F87791817215355DD8E7E7A@DM8PR11MB5750.namprd11.prod.outlook.com>
Date: Thu, 2 Oct 2025 06:59:04 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: "Annapurve, Vishal" <vannapurve@...gle.com>, "Hansen, Dave"
	<dave.hansen@...el.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"bp@...en8.de" <bp@...en8.de>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"peterz@...radead.org" <peterz@...radead.org>, "mingo@...hat.com"
	<mingo@...hat.com>, "hpa@...or.com" <hpa@...or.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "x86@...nel.org"
	<x86@...nel.org>, "kas@...nel.org" <kas@...nel.org>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>, "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
	"Huang, Kai" <kai.huang@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Chatre, Reinette" <reinette.chatre@...el.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "Williams, Dan J" <dan.j.williams@...el.com>,
	"ashish.kalra@....com" <ashish.kalra@....com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>, "Gao, Chao" <chao.gao@...el.com>, "sagis@...gle.com"
	<sagis@...gle.com>, "Chen, Farrah" <farrah.chen@...el.com>, Binbin Wu
	<binbin.wu@...ux.intel.com>
Subject: RE: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX
 partial write erratum

> On Wed, Oct 1, 2025 at 7:32 AM Dave Hansen <dave.hansen@...el.com>
> wrote:
> >
> > On 9/30/25 19:05, Vishal Annapurve wrote:
> > ...
> > >> Any workarounds are going to be slow and probably imperfect. That's not
> > >
> > > Do we really need to deploy workarounds that are complex and slow to
> > > get kdump working for the majority of the scenarios? Is there any
> > > analysis done for the risk with imperfect and simpler workarounds vs
> > > benefits of kdump functionality?
> > >
> > >> a great match for kdump. I'm perfectly happy waiting for fixed hardware
> > >> from what I've seen.
> > >
> > > IIUC SPR/EMR - two CPU generations out there are impacted by this
> > > erratum and just disabling kdump functionality IMO is not the best
> > > solution here.
> >
> > That's an eminently reasonable position. But we're speaking in broad
> > generalities and I'm unsure what you don't like about the status quo or
> > how you'd like to see things change.
> 
> Looks like the decision to disable kdump was taken between [1] -> [2].
> "The kernel currently doesn't track which page is TDX private memory.
> It's not trivial to reset TDX private memory.  For simplicity, this
> series simply disables kexec/kdump for such platforms.  This will be
> enhanced in the future."
> 
> A patch [3] from the series[1], describes the issue as:
> "This problem is triggered by "partial" writes where a write transaction
> of less than cacheline lands at the memory controller.  The CPU does
> these via non-temporal write instructions (like MOVNTI), or through
> UC/WC memory mappings.  The issue can also be triggered away from the
> CPU by devices doing partial writes via DMA."
> 
> And also mentions:
> "Also note only the normal kexec needs to worry about this problem, but
> not the crash kexec: 1) The kdump kernel only uses the special memory
> reserved by the first kernel, and the reserved memory can never be used
> by TDX in the first kernel; 2) The /proc/vmcore, which reflects the
> first (crashed) kernel's memory, is only for read.  The read will never
> "poison" TDX memory thus cause unexpected machine check (only partial
> write does)."

While the statement that the read will never poison the memory is correct,
the situation we can theoretically worry about is the following in my understanding:

1. During its execution on platform with partial write problem, host OS or other
actor executing outside of SEAM mode triggers partial write into a cache line that
originally belonged to TDX private memory. 
This is smth that host OS or other entities should not do, but it could happen due
to host OS bugs, etc. 
2. The above causes the specified cache line to be poisoned by mem controller. 
However, here we assume that no one accesses this cache line from TDX module,
TD guests or Host OS for the time being and the problem remains hidden.
3. Host OS crashes due to some other issue, kdump crash kernel is triggered,
and kdump starts to read all the memory from the previous host kernel to dump
the diagnostics info.
4. At some point of time, kdump crash kernel reaches the memory with the poisoned
cache line, consumes poison, and the #MC is issued for the kernel space. 

Isn't this the reason for also disabling kdump? Or do I miss smth?

Best Regards,
Elena.