[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e54d68e7-3d57-4128-8c07-dd6c66196a7e@intel.com>
Date: Tue, 4 Jun 2024 12:57:50 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, <linux-kernel@...r.kernel.org>
CC: <x86@...nel.org>, <dave.hansen@...el.com>, <bp@...en8.de>,
<kirill.shutemov@...ux.intel.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<hpa@...or.com>, <luto@...nel.org>, <peterz@...radead.org>,
<rick.p.edgecombe@...el.com>, <ashish.kalra@....com>, <chao.gao@...el.com>,
<bhe@...hat.com>, <nik.borisov@...e.com>, <pbonzini@...hat.com>,
<seanjc@...gle.com>
Subject: Re: [PATCH v4 1/5] x86/kexec: do unconditional WBINVD for bare-metal
in stop_this_cpu()
On 1/06/2024 8:45 am, Tom Lendacky wrote:
> On 5/22/24 21:49, Huang, Kai wrote:
>> On 18/04/2024 11:48 pm, Kai Huang wrote:
>>> TL;DR:
>>>
>>> Change to do unconditional WBINVD in stop_this_cpu() for bare metal
>>> to cover kexec support for both AMD SME and Intel TDX, despite there
>>> _was_ some issue preventing from doing so but now has it got fixed.
>>>
>>> Long version:
>>>
>>> Both AMD SME and Intel TDX can leave caches in an incoherent state due
>>> to memory encryption, which can lead to silent memory corruption during
>>> kexec. To address this issue, it is necessary to flush the caches
>>> before jumping to the second kernel.
>>>
>>> Currently, the kernel only performs WBINVD in stop_this_cpu() when SME
>>> is supported by hardware. To support TDX, instead of adding one more
>>> vendor-specific check, it is proposed to perform unconditional WBINVD.
>>> Kexec() is a slow path, and the additional WBINVD is acceptable for the
>>> sake of simplicity and maintainability.
>>>
>>
>> Hi Tom,
>>
>> May I ask how does SME work with kdump in crash_kexec(). Looking at
>> the code, AFAICT the crash_kexec() path doesn't use stop_this_cpu() to
>> stop all other cpus. Instead, kdump_nmi_shootdown_cpus() is called to
>> send NMI to remote cpus and crash_nmi_callback() is invoked to stop them.
>>
>> But the crash_nmi_callback() doesn't invoke WBINVD for SME AFAICT. It
>> does call the kdump_nmi_callback() callback where a WBINVD is
>> performed for the SNP host:
>>
>> void kdump_sev_callback(void)
>> {
>> /*
>> * Do wbinvd() on remote CPUs when SNP is enabled in order to
>> * safely do SNP_SHUTDOWN on the local CPU.
>> */
>> if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
>> wbinvd();
>> }
>>
>> So if I read correctly, what's the reason the WBINVD is skipped for
>> SME in case of crash_kexec()?
>
> The system is rebooted after a crash and doesn't continue directly on
> into a new kernel.
>
How about the kdump kernel itself? Would the stale cachelines
potentially corrupt it?
And how about /proc/vmcore, which reflects the system RAM used by the
first, crashed, kernel? Is it OK to have stale cachelines for it?
Powered by blists - more mailing lists