lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250630084808.GH1613376@noisy.programming.kicks-ass.net>
Date: Mon, 30 Jun 2025 10:48:08 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Shashank Balaji <shashank.mahadasyam@...y.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
	Alexandre Ghiti <alex@...ti.fr>, linux-riscv@...ts.infradead.org,
	Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
	James Morse <james.morse@....com>,
	Nicholas Piggin <npiggin@...il.com>,
	linux-arm-kernel@...ts.infradead.org,
	Rahul Bukte <rahul.bukte@...y.com>,
	Daniel Palmer <daniel.palmer@...y.com>,
	Shinya Takumi <shinya.takumi@...y.com>
Subject: Re: [RFC PATCH] kernel/cpu: in freeze_secondary_cpus() ensure
 primary cpu is of domain type

On Mon, Jun 30, 2025 at 05:20:59PM +0900, Shashank Balaji wrote:
> On an x86 machine, when cpu 0 is isolated with "isolcpus=", on initiating
> suspend to memory, a warning is triggered, followed by a kernel crash. This is
> on a defconfig + CONFIG_ENERGY_MODEL kernel:

> This happens because in order to offline the last secondary cpu, i.e. cpu 1,
> build_sched_domains() ends up being passed an empty cpumask, since the only remaining
> cpu (cpu 0) is isolated. It warns and fails, after which perf domains are
> are attempted to be built, which crashes the kernel. The same problem occurs
> during cpu hotplug, but that was fixed by
> commit 38685e2a0476127d ("cpu/hotplug: Don't offline the last non-isolated CPU").
> 
> Fix this by ensuring that the primary cpu, the last standing cpu, is of domain
> type, so that build_sched_domains() is not passed an empty cpumask.
> 

> Co-developed-by: Rahul Bukte <rahul.bukte@...y.com>
> Signed-off-by: Rahul Bukte <rahul.bukte@...y.com>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@...y.com>
> ---
>  kernel/cpu.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a59e009e0be4..d9167b0559a5 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1902,12 +1902,28 @@ int freeze_secondary_cpus(int primary)
>  
>  	cpu_maps_update_begin();
>  	if (primary == -1) {
> -		primary = cpumask_first(cpu_online_mask);
> -		if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
> -			primary = housekeeping_any_cpu(HK_TYPE_TIMER);
> +		primary = cpumask_first_and_and(cpu_online_mask,
> +								housekeeping_cpumask(HK_TYPE_TIMER),
> +								housekeeping_cpumask(HK_TYPE_DOMAIN));

That's terrible indenting, please align after the opening bracket like:

		primary = cpumask_first_and_and(cpu_online_mask,
						housekeeping_cpumask(HK_TYPE_TIMER),
						housekeeping_cpumask(HK_TYPE_DOMAIN));

Also, IIRC HK_TYPE_HRTIMER is deprecated and should be something like
HK_TYPE_NOISE or somesuch. Frederic?

> +		if (primary >= nr_cpu_ids) {
> +			error = -ENODEV;
> +			pr_err("No suitable primary CPU found. Ensure at least one non-isolated, non-nohz_full CPU is online\n");
> +			goto abort;
> +		}
>  	} else {
> -		if (!cpu_online(primary))
> -			primary = cpumask_first(cpu_online_mask);
> +		if (!cpu_online(primary)) {
> +			primary = cpumask_first_and(cpu_online_mask,
> +								housekeeping_cpumask(HK_TYPE_DOMAIN));

Indenting again.

> +			if (primary >= nr_cpu_ids) {
> +				error = -ENODEV;
> +				pr_err("No suitable primary CPU found. Ensure at least one non-isolated CPU is online\n");
> +				goto abort;
> +			}
> +		} else if (!housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
> +			error = -ENODEV;
> +			pr_err("Primary CPU %d should not be isolated\n", primary);
> +			goto abort;
> +		}
>  	}
>  
>  	/*
> @@ -1943,6 +1959,7 @@ int freeze_secondary_cpus(int primary)
>  	else
>  		pr_err("Non-boot CPUs are not disabled\n");
>  
> +abort:
>  	/*
>  	 * Make sure the CPUs won't be enabled by someone else. We need to do
>  	 * this even in case of failure as all freeze_secondary_cpus() users are


Also; doesn't the above boil down to something like:

	if (primary == -1) {
		primary = cpumask_first_and_and(cpu_online_mask,
						housekeeping_cpumask(HK_TYPE_TIMER),
						housekeeping_cpumask(HK_TYPE_DOMAIN));
	} if (!cpu_online(primary)) {
		primary = cpumask_first_and(cpu_online_mask,
					    housekeeping_cpumask(HK_TYPE_DOMAIN));
	} 

	if (primary >= nr_cpu_ids || !housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
		error = -ENODEV;
		pr_err("Primary CPU %d should not be isolated\n", primary);
		goto abort;
	}

Yes, this has less error string variation, but the code is simpler.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ