[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB8ipk8xXWzc_PurHwVPd9-azN4B5OD=MYQP+Oze1kmbom0avQ@mail.gmail.com>
Date: Fri, 18 Nov 2022 20:08:54 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, peterz@...radead.org,
mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
vschneid@...hat.com, ke.wang@...soc.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent
rto_push_irq_work's dead loop
On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@...soc.com> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> > {
> > int next;
> > int cpu;
> > + struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
Ha, I did not take it into account. Thanks!
>
> >
> > /*
> > * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> > /* When rto_cpu is -1 this acts like cpumask_first() */
> > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > + break;
> > +
>
> Kill the above.
>
> > rd->rto_cpu = cpu;
> >
> > if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
> if (!cpumask_test_cpu(cpu, cpu_active_mask))
> continue;
> return cpu;
> }
>
> ?
Let's consider this scenario:
the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
0x01(cpu0),the rto cpu is cpu0,
the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first
loop, the rto_cpu would be -1,
but the loop < rto_loop_next, on next loop, because of the rto_cpu is
-1, so the next rto cpu would
be cpu0 still, as a result, the cpu0 would push rt tasks to
cpu1(inactive cpu) while running in the irq_work.
So we should judge whether the current cpu(the only one active cpu) is
the next loop's cpu.
Thanks!
>
> -- Steve
Powered by blists - more mailing lists