linux-kernel - Re: [PATCH] sched/core: forced idle accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29Nu6F4__ryF5p0En--Ze6CCev1Jy81W=LkTYaacf-YLkFg@mail.gmail.com>
Date:   Mon, 11 Oct 2021 17:12:43 -0700
From:   Josh Don <joshdon@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Vineeth Pillai <vineethrp@...il.com>,
        Hao Luo <haoluo@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/core: forced idle accounting

On Sat, Oct 9, 2021 at 8:55 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Oct 07, 2021 at 05:08:25PM -0700, Josh Don wrote:
> > Adds accounting for "forced idle" time, which is time where a cookie'd
> > task forces its SMT sibling to idle, despite the presence of runnable
> > tasks.
> >
> > Forced idle time is one means to measure the cost of enabling core
> > scheduling (ie. the capacity lost due to the need to force idle).
>
> It seems an excessive amount of code for what it says to do.

I think I can cut some of that down by simplifying the SMT>2 case :)

>
> > +     smt_count = cpumask_weight(smt_mask);
>
> That's a fairly expensive operation to find a number that's going the be
> to same over and over and over...

Per Tao's suggestion, the nr_running and nr_forced_idle can be
computed and cached in pick(). Then there won't be any extra overhead
here, other than a potential division when SMT>2.

> > +     if (smt_count > 2) {
> > +             unsigned int nr_forced_idle = 0, nr_running = 0;
> > +
> > +             for_each_cpu(i, smt_mask) {
> > +                     rq_i = cpu_rq(i);
> > +                     p = rq_i->core_pick ?: rq_i->curr;
> > +
> > +                     if (p != rq_i->idle)
> > +                             nr_running++;
> > +                     else if (rq_i->nr_running)
> > +                             nr_forced_idle++;
> > +             }
> > +
> > +             if (WARN_ON_ONCE(!nr_running)) {
> > +                     /* can't be forced idle without a running task */
> > +             } else {
> > +                     delta *= nr_forced_idle;
> > +                     delta /= nr_running;
> > +             }
>
> Now the comment sayeth:
>
> > +     /*
> > +      * For larger SMT configurations, we need to scale the charged
> > +      * forced idle amount since there can be more than one forced idle
> > +      * sibling and more than one running cookied task.
> > +      */
>
> But why?

We scale by the number of cpus actually forced idle, since we don't
want to falsely over or under charge forced idle time (defined
strictly as time where we have a runnable task but idle the cpu). The
more important scaling here though is the division over the number of
running entities. This is done so that the aggregate amount of forced
idle over some group of threads makes sense. Ie if we have a cpu with
SMT8, and a group of 7 threads sharing a cookie, we don't want to
accrue 7 units of forced idle time per unit time while the 8th SMT is
forced idle.

> > +     }
> > +
> > +     for_each_cpu(i, smt_mask) {
> > +             rq_i = cpu_rq(i);
> > +             p = rq_i->core_pick ?: rq_i->curr;
> > +
> > +             if (!p->core_cookie)
> > +                     continue;
> > +
> > +             p->core_forceidle_sum += delta;
> > +
> > +             /* Optimize for common case. */
> > +             if (smt_count == 2)
> > +                     break;
> > +     }
> > +}