linux-kernel - Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAB8ipk8xXWzc_PurHwVPd9-azN4B5OD=MYQP+Oze1kmbom0avQ@mail.gmail.com>
Date:   Fri, 18 Nov 2022 20:08:54 +0800
From:   Xuewen Yan <xuewen.yan94@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Xuewen Yan <xuewen.yan@...soc.com>, peterz@...radead.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, ke.wang@...soc.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent
 rto_push_irq_work's dead loop

On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@...soc.com> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> >  {
> >       int next;
> >       int cpu;
> > +     struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
Ha, I did not take it into account. Thanks！

>
> >
> >       /*
> >        * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> >               /* When rto_cpu is -1 this acts like cpumask_first() */
> >               cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > +             cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > +             if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > +                 cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > +                     break;
> > +
>
> Kill the above.
>
> >               rd->rto_cpu = cpu;
> >
> >               if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
>                         if (!cpumask_test_cpu(cpu, cpu_active_mask))
>                                 continue;
>                         return cpu;
>                 }
>
> ?
Let's consider this scenario:
the online_cpu_mask is 0x03(cpu0/1)，the active_cpu_mask is
0x01(cpu0)，the rto cpu is cpu0,
the rto_mask is 0x01, and the irq cpu is cpu0, as a result,  the first
loop, the rto_cpu would be -1,
but the loop < rto_loop_next, on  next loop, because of the rto_cpu is
-1, so the next rto cpu would
be cpu0 still, as a result, the cpu0 would push rt tasks to
cpu1(inactive cpu) while running in the irq_work.

So we should judge whether the current cpu(the only one active cpu) is
the next loop's cpu.

Thanks！

>
> -- Steve