lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240702084418.GB11386@noisy.programming.kicks-ass.net>
Date: Tue, 2 Jul 2024 10:44:18 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Yang Yingliang <yangyingliang@...weicloud.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	bristot@...hat.com, vschneid@...hat.com, tglx@...utronix.de,
	yu.c.chen@...el.com, tim.c.chen@...ux.intel.com,
	yangyingliang@...wei.com, liwei391@...wei.com
Subject: Re: [PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc

On Tue, Jul 02, 2024 at 04:11:28PM +0800, Yang Yingliang wrote:
> From: Yang Yingliang <yangyingliang@...wei.com>
> 
> I got the following warn report while doing stress test:
> 
> jump label: negative count!
> WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> Call Trace:
>  <TASK>
>  __static_key_slow_dec_cpuslocked+0x16/0x70
>  sched_cpu_deactivate+0x26e/0x2a0
>  cpuhp_invoke_callback+0x3ad/0x10d0
>  cpuhp_thread_fun+0x3f5/0x680
>  smpboot_thread_fn+0x56d/0x8d0
>  kthread+0x309/0x400
>  ret_from_fork+0x41/0x70
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> Because when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
> the cpu offline failed, but sched_smt_present is decremented before
> calling sched_cpu_deactivate(), it leads to unbalanced dec/inc, so
> fix it by incrementing sched_smt_present in the error path.
> 
> Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> Reviewed-by: Chen Yu <yu.c.chen@...el.com>
> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Yang Yingliang <yangyingliang@...wei.com>
> ---
>  kernel/sched/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcf2c4cc0522..5ab6717b57e0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9756,6 +9756,10 @@ int sched_cpu_deactivate(unsigned int cpu)
>  	sched_update_numa(cpu, false);
>  	ret = cpuset_cpu_inactive(cpu);
>  	if (ret) {
> +#ifdef CONFIG_SCHED_SMT
> +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> +			static_branch_inc_cpuslocked(&sched_smt_present);
> +#endif
>  		balance_push_set(cpu, false);
>  		set_cpu_active(cpu, true);
>  		sched_update_numa(cpu, true);

Yes, does indeed appear needed, however!, when I look at
what else goes before this failure, should we not also call
set_rq_online() and things like that?

That is, can we rework things to be less fragile by sharing code between
this error path and sched_cpu_activate() ?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ