linux-kernel - Re: [PATCH 1/7] stop_machine: Introduce stop_machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <80c16fe8-471a-4e31-987c-15d377f8f165@intel.com>
Date: Thu, 5 Feb 2026 18:14:39 -0800
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: Borislav Petkov <bp@...en8.de>, "Kaplan, David" <David.Kaplan@....com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "peterz@...radead.org" <peterz@...radead.org>
Subject: Re: [PATCH 1/7] stop_machine: Introduce stop_machine_nmi()

On 2/2/2026 2:54 AM, Borislav Petkov wrote:
...
> @@ -174,8 +174,26 @@ struct multi_stop_data {
>   
>   	enum multi_stop_state	state;
>   	atomic_t		thread_ack;
> +
> +	bool			use_nmi;
> +
> +	/*
> +	 * cpumasks of CPUs on which to raise an NMI; used in the NMI
> +	 * stomp_machine variant. nmi_cpus_done is used for tracking
> +	 * when the NMI handler has executed successfully.
> +	 */
> +	struct cpumask		nmi_cpus;
> +	struct cpumask		nmi_cpus_done;
> +
> +};

Looks like every stop_machine variant then will spend stack for these 
masks. It seems they could be cpumask_var_t.

Alternatively, to make it simple further, a per-CPU variable could 
achieve this if I understand correctly:

   struct stop_machine_nmi_ctrl {
	...
	bool	done;
   }

Then,

   for_each_cpu(cpu, cpus)
   	per_cpu(stop_machine_nmi_ctrl.done, cpu) = false;

   ...
   ret = stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
   ...

   for_each_cpu(cpu, cpus) {
   	if (!per_cpu(stop_machine_nmi_ctrl.done, cpu)) {
		pr_err("Some CPUs didn't run ...\n");
		return -EINVAL;
	}
   }

Or, I guess even the msdata pointer alone could do that too -- setting 
it before and checking if NULL after.

> +static int __stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
> +			    const struct cpumask *cpus, bool use_nmi)
>   {
>   	struct multi_stop_data msdata = {
>   		.fn = fn,
>   		.data = data,
>   		.num_threads = num_online_cpus(),
>   		.active_cpus = cpus,
> +		.use_nmi = use_nmi,
>   	};
> +	int ret, cpu;
> +
> +	if (use_nmi) {
> +		cpumask_copy(&msdata.nmi_cpus, cpus);
> +		cpumask_clear(&msdata.nmi_cpus_done);
> +	}
>   
>   	lockdep_assert_cpus_held();
>   
> @@ -617,7 +677,32 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
>   
>   	/* Set the initial state and stop all online cpus. */
>   	set_state(&msdata, MULTI_STOP_PREPARE);
> -	return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
> +	ret = stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
> +
> +	if (!use_nmi)
> +		return ret;
> +
> +	if (!cpumask_equal(cpus, &msdata.nmi_cpus_done)) {
> +		pr_err("Some CPUs didn't run the stomp_machine NMI handler\n");
> +		return -EINVAL;
> +	} else {
> +		for_each_cpu(cpu, cpus)
> +			ret |= per_cpu(stop_machine_nmi_ctrl.err, cpu);

This error accumulation here makes sense to me though,

Currently, stop_machine() documents:

/*
  ...
  * Return: 0 if all invocations of @fn return zero.  Otherwise, the
  * value returned by an arbitrarily chosen member of the set of calls to
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  * @fn that returned non-zero.
  */

This behavior dates back to:

   commit 1142d810298e694 ("cpu_stop: implement stop_cpu[s]()")

where cpu_stopper_thread() does:

   struct cpu_stop_done *done = work->done;
   ...
   ret = fn(arg);
   if (ret)
   	done->ret = ret;

So the return value is overwritten instead of accumulation, while struct 
cpu_stop_done is shared like the note:

/*
  * Structure to determine completion condition and record errors.  May
  * be shared by works on different cpus.
  */

I don't know whether that was an intentional design choice or not. But, 
at least the NMI variant might have a slight different semantic in this 
regard.