lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1db04264f6ec2d0b1624f9b1f528538174fa95ee.camel@linux.intel.com>
Date:   Thu, 07 Sep 2023 15:11:42 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Chen Yu <yu.c.chen@...el.com>,
        Yang Yingliang <yangyingliang@...wei.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, tglx@...utronix.de
Subject: Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc

On Mon, 2023-09-04 at 10:26 +0800, Chen Yu wrote:
> Hi Yingliang,
> 
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> > I got the following warn report while doing stress test:
> > 
> 
> May I know if the test is to run many deadline tasks while offline the CPUs,
> so as to trigger the failing case that removing one CPU gets us below the total
> allocated bandwidth?
> 
> > jump label: negative count!
> > WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> > Call Trace:
> >  <TASK>
> >  __static_key_slow_dec_cpuslocked+0x16/0x70
> >  sched_cpu_deactivate+0x26e/0x2a0
> >  cpuhp_invoke_callback+0x3ad/0x10d0
> >  cpuhp_thread_fun+0x3f5/0x680
> >  smpboot_thread_fn+0x56d/0x8d0
> >  kthread+0x309/0x400
> >  ret_from_fork+0x41/0x70
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> > 
> > Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
> 
> s/Becaus/Because/
> 
> > the cpu offline failed, but sched_smt_present is decreased before

s/decreased/decremented
to be precise

> > calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix

s/leads unbalance dec/inc / leads to unbalanced dec/inc
> 
> s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?
> > it by increasing sched_smt_present in the error path.

s/increasing/incrementing

> > 
> > Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> > Signed-off-by: Yang Yingliang <yangyingliang@...wei.com>
> > ---
> >  kernel/sched/core.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2299a5cfbfb9..b7ef2df36b75 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
> >  	sched_update_numa(cpu, false);
> >  	ret = cpuset_cpu_inactive(cpu);
> >  	if (ret) {
> > +#ifdef CONFIG_SCHED_SMT
> > +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> > +			static_branch_inc_cpuslocked(&sched_smt_present);
> > +#endif
> 
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.
> 
> Other than above typo, it looks good to me,
> 
> Reviewed-by: Chen Yu <yu.c.chen@...el.com>
> 

Other than some minor nits to commit log wording,

Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ