[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1db04264f6ec2d0b1624f9b1f528538174fa95ee.camel@linux.intel.com>
Date: Thu, 07 Sep 2023 15:11:42 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Chen Yu <yu.c.chen@...el.com>,
Yang Yingliang <yangyingliang@...wei.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com, tglx@...utronix.de
Subject: Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
On Mon, 2023-09-04 at 10:26 +0800, Chen Yu wrote:
> Hi Yingliang,
>
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> > I got the following warn report while doing stress test:
> >
>
> May I know if the test is to run many deadline tasks while offline the CPUs,
> so as to trigger the failing case that removing one CPU gets us below the total
> allocated bandwidth?
>
> > jump label: negative count!
> > WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> > Call Trace:
> > <TASK>
> > __static_key_slow_dec_cpuslocked+0x16/0x70
> > sched_cpu_deactivate+0x26e/0x2a0
> > cpuhp_invoke_callback+0x3ad/0x10d0
> > cpuhp_thread_fun+0x3f5/0x680
> > smpboot_thread_fn+0x56d/0x8d0
> > kthread+0x309/0x400
> > ret_from_fork+0x41/0x70
> > ret_from_fork_asm+0x1b/0x30
> > </TASK>
> >
> > Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
>
> s/Becaus/Because/
>
> > the cpu offline failed, but sched_smt_present is decreased before
s/decreased/decremented
to be precise
> > calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix
s/leads unbalance dec/inc / leads to unbalanced dec/inc
>
> s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?
> > it by increasing sched_smt_present in the error path.
s/increasing/incrementing
> >
> > Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> > Signed-off-by: Yang Yingliang <yangyingliang@...wei.com>
> > ---
> > kernel/sched/core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2299a5cfbfb9..b7ef2df36b75 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
> > sched_update_numa(cpu, false);
> > ret = cpuset_cpu_inactive(cpu);
> > if (ret) {
> > +#ifdef CONFIG_SCHED_SMT
> > + if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> > + static_branch_inc_cpuslocked(&sched_smt_present);
> > +#endif
>
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.
>
> Other than above typo, it looks good to me,
>
> Reviewed-by: Chen Yu <yu.c.chen@...el.com>
>
Other than some minor nits to commit log wording,
Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
Powered by blists - more mailing lists