linux-kernel - Re: [PATCH v4 1/5] x86/kexec: do unconditional WBINVD for bare-metal in stop_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e54d68e7-3d57-4128-8c07-dd6c66196a7e@intel.com>
Date: Tue, 4 Jun 2024 12:57:50 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, <linux-kernel@...r.kernel.org>
CC: <x86@...nel.org>, <dave.hansen@...el.com>, <bp@...en8.de>,
	<kirill.shutemov@...ux.intel.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<hpa@...or.com>, <luto@...nel.org>, <peterz@...radead.org>,
	<rick.p.edgecombe@...el.com>, <ashish.kalra@....com>, <chao.gao@...el.com>,
	<bhe@...hat.com>, <nik.borisov@...e.com>, <pbonzini@...hat.com>,
	<seanjc@...gle.com>
Subject: Re: [PATCH v4 1/5] x86/kexec: do unconditional WBINVD for bare-metal
 in stop_this_cpu()



On 1/06/2024 8:45 am, Tom Lendacky wrote:
> On 5/22/24 21:49, Huang, Kai wrote:
>> On 18/04/2024 11:48 pm, Kai Huang wrote:
>>> TL;DR:
>>>
>>> Change to do unconditional WBINVD in stop_this_cpu() for bare metal
>>> to cover kexec support for both AMD SME and Intel TDX, despite there
>>> _was_ some issue preventing from doing so but now has it got fixed.
>>>
>>> Long version:
>>>
>>> Both AMD SME and Intel TDX can leave caches in an incoherent state due
>>> to memory encryption, which can lead to silent memory corruption during
>>> kexec.  To address this issue, it is necessary to flush the caches
>>> before jumping to the second kernel.
>>>
>>> Currently, the kernel only performs WBINVD in stop_this_cpu() when SME
>>> is supported by hardware.  To support TDX, instead of adding one more
>>> vendor-specific check, it is proposed to perform unconditional WBINVD.
>>> Kexec() is a slow path, and the additional WBINVD is acceptable for the
>>> sake of simplicity and maintainability.
>>>
>>
>> Hi Tom,
>>
>> May I ask how does SME work with kdump in crash_kexec().  Looking at 
>> the code, AFAICT the crash_kexec() path doesn't use stop_this_cpu() to 
>> stop all other cpus.  Instead, kdump_nmi_shootdown_cpus() is called to 
>> send NMI to remote cpus and crash_nmi_callback() is invoked to stop them.
>>
>> But the crash_nmi_callback() doesn't invoke WBINVD for SME AFAICT.  It 
>> does call the kdump_nmi_callback() callback where a WBINVD is 
>> performed for the SNP host:
>>
>> void kdump_sev_callback(void)
>> {
>>          /*
>>           * Do wbinvd() on remote CPUs when SNP is enabled in order to
>>           * safely do SNP_SHUTDOWN on the local CPU.
>>           */
>>          if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
>>                  wbinvd();
>> }
>>
>> So if I read correctly, what's the reason the WBINVD is skipped for 
>> SME in case of crash_kexec()?
> 
> The system is rebooted after a crash and doesn't continue directly on 
> into a new kernel.
> 

How about the kdump kernel itself?  Would the stale cachelines 
potentially corrupt it?

And how about /proc/vmcore, which reflects the system RAM used by the 
first, crashed, kernel?  Is it OK to have stale cachelines for it?