[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b007887-d475-4970-b01d-008631621192@intel.com>
Date: Thu, 2 Oct 2025 08:06:33 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Juergen Gross <jgross@...e.com>,
"Reshetova, Elena" <elena.reshetova@...el.com>,
"Annapurve, Vishal" <vannapurve@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>, "hpa@...or.com" <hpa@...or.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"x86@...nel.org" <x86@...nel.org>, "kas@...nel.org" <kas@...nel.org>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"dwmw@...zon.co.uk" <dwmw@...zon.co.uk>, "Huang, Kai" <kai.huang@...el.com>,
"seanjc@...gle.com" <seanjc@...gle.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"ashish.kalra@....com" <ashish.kalra@....com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>, "Gao, Chao"
<chao.gao@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
"Chen, Farrah" <farrah.chen@...el.com>, Binbin Wu <binbin.wu@...ux.intel.com>
Subject: Re: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX
partial write erratum
On 10/2/25 00:46, Juergen Gross wrote:
> So lets compare the 2 cases with kdump enabled and disabled in your
> scenario (crash of the host OS):
>
> kdump enabled: No dump can be produced due to the #MC and system is
> rebooted.
>
> kdump disabled: No dump is produced and system is rebooted after crash.
> > What is the main concern with kdump enabled? I don't see any
> disadvantage with enabling it, just the advantage that in many cases
> a dump will be written.
The disadvantage is that a kernel bug from long ago results in a machine
check. Machine checks are generally indicative of bad hardware. So the
disadvantage is that someone mistakes the long ago kernel bug for bad
hardware.
There are two ways of looking at this:
1. A theoretically fragile kdump is better than no kdump at all. All of
the stars would have to align for kdump to _fail_ and we don't think
that's going to happen often enough to matter.
2. kdump happens after kernel bugs. The machine checks happen because of
kernel bugs. It's not a big stretch to think that, at scale, kdump is
going to run in to these #MCs on a regular basis.
Does that capture the two perspectives fairly?
Powered by blists - more mailing lists