linux-kernel - Re: [PATCH v3 1/6] x86/sme: Use percpu boolean to control wbinvd during kexec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0b8948ed672ebf6701ddc350914e4e325032ad87.camel@intel.com>
Date: Fri, 27 Jun 2025 00:30:06 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Hansen, Dave" <dave.hansen@...el.com>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>, "bp@...en8.de" <bp@...en8.de>,
	"peterz@...radead.org" <peterz@...radead.org>, "hpa@...or.com"
	<hpa@...or.com>, "mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "thomas.lendacky@....com" <thomas.lendacky@....com>
CC: "seanjc@...gle.com" <seanjc@...gle.com>, "x86@...nel.org"
	<x86@...nel.org>, "sagis@...gle.com" <sagis@...gle.com>, "Chatre, Reinette"
	<reinette.chatre@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "ashish.kalra@....com" <ashish.kalra@....com>,
	"nik.borisov@...e.com" <nik.borisov@...e.com>
Subject: Re: [PATCH v3 1/6] x86/sme: Use percpu boolean to control wbinvd
 during kexec

On Thu, 2025-06-26 at 18:42 +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-06-26 at 10:59 -0700, Rick Edgecombe wrote:
> > Since the race is related to stop_this_cpu() it doesn't affect it. But it does
> > mean that the bring up CPU may not flush the cache if takes a kdump kexec before
> > the per-cpu var is set? Of course the existing logic doesn't trigger until
> > cpuinfo_x86 is populated. What is the consequence?
> 
> Oh, doh! This stuff all happens before kdump could load. Right?

I'll replace kdump with normal kexec in the above context, since kdump case
(crash_kexec()) uses a different path from normal kexec when stopping remote
CPUs.  And there's no WBINVD involved in the carsh dump code path for SME.

The reason I guess is reading encrypted memory from /proc/vmcore is
meaningless anyway therefore even memory corruption can happen it is fine.

Now for normal kexec:

For SME this boolean is never cleared once set when the CPU is brought up
for the first time.  So in practice yes when the kexec is performed the
boolean is already set for all CPUs.

But, you are right there's a case that 'maxcpus' kernel commandline is used
which results in only part of CPUs are brought up during boot, and the user
can manually online the rest CPUs at runtime.

(Side: this 'maxcpus' should not be used for modern machine, see the end).

In this case, IIUC, _theoretically_, the native_stop_other_cpus() could race
with CPU onlining, because I didn't see CPU hotplug is explicitly disabled
before that.

But, the native_stop_other_cpus() only sends IPIs to the CPUs which is
already set in cpu_online_mask, and when one CPU is already in
cpu_online_mask, the code to set the boolean has already been done, so the
WBINVD will be done for that.

The only possible issue that I could find is one CPU becomes online after
cpus_stop_mask is already populated:

	CPU 0						CPU 1

							start_secondary()

	cpumask_copy(&cpus_stop_mask, cpu_online_mask);
	cpumask_clear_cpu(this_cpu, &cpus_stop_mask);		
							...
							ap_starting();
							...
							set_cpu_online();
							...

	if (!cpumask_empty(&cpus_stop_mask)) {                                    
                apic_send_IPI_allbutself(REBOOT_VECTOR);
		...

But again this issue (assuming it exists) is an existing issue which is not
related to this patch.

And I am not 100% sure whether this issue exists, since allowing CPU hotplug
during kexec doesn't seem reasonable to me at least on x86, despite it is
indeed enabled kernel_kexec() common code:

        /*
         * migrate_to_reboot_cpu() disables CPU hotplug assuming that
         * no further code needs to use CPU hotplug (which is true in
         * the reboot case). However, the kexec path depends on using
         * CPU hotplug again; so re-enable it here.
         */
         cpu_hotplug_enable();
         pr_notice("Starting new kernel\n");
         machine_shutdown();

I tried to git blame to find clue but failed since the history is lost
during file move/renaming etc.  I suspect it is for other ARCHs.

At last, I believe 'maxcpus' should not be used for modern platforms anyway
due to below:

static inline bool cpu_bootable(unsigned int cpu)
{
	...

        /*
         * On x86 it's required to boot all logical CPUs at least once so
         * that the init code can get a chance to set CR4.MCE on each
         * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on any
         * core will shutdown the machine.
         */
        return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
}

So using boolean for SME really shouldn't have any issue.