[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZPVATM5xfmiFKWtk@chenyu5-mobl2>
Date: Mon, 4 Sep 2023 10:26:20 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Yang Yingliang <yangyingliang@...wei.com>
CC: <linux-kernel@...r.kernel.org>, <mingo@...hat.com>,
<peterz@...radead.org>, <juri.lelli@...hat.com>,
<vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
<rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
<bristot@...hat.com>, <vschneid@...hat.com>, <tglx@...utronix.de>
Subject: Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
Hi Yingliang,
On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> I got the following warn report while doing stress test:
>
May I know if the test is to run many deadline tasks while offline the CPUs,
so as to trigger the failing case that removing one CPU gets us below the total
allocated bandwidth?
> jump label: negative count!
> WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> Call Trace:
> <TASK>
> __static_key_slow_dec_cpuslocked+0x16/0x70
> sched_cpu_deactivate+0x26e/0x2a0
> cpuhp_invoke_callback+0x3ad/0x10d0
> cpuhp_thread_fun+0x3f5/0x680
> smpboot_thread_fn+0x56d/0x8d0
> kthread+0x309/0x400
> ret_from_fork+0x41/0x70
> ret_from_fork_asm+0x1b/0x30
> </TASK>
>
> Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
s/Becaus/Because/
> the cpu offline failed, but sched_smt_present is decreased before
> calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix
s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?
> it by increasing sched_smt_present in the error path.
>
> Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> Signed-off-by: Yang Yingliang <yangyingliang@...wei.com>
> ---
> kernel/sched/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2299a5cfbfb9..b7ef2df36b75 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
> sched_update_numa(cpu, false);
> ret = cpuset_cpu_inactive(cpu);
> if (ret) {
> +#ifdef CONFIG_SCHED_SMT
> + if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> + static_branch_inc_cpuslocked(&sched_smt_present);
> +#endif
While checking the code, it seems that the core scheduling also missed
the error path, maybe we should also invoke sched_core_cpu_starting() to
restore the context. I'll have a check.
Other than above typo, it looks good to me,
Reviewed-by: Chen Yu <yu.c.chen@...el.com>
thanks,
Chenyu
Powered by blists - more mailing lists