linux-kernel - Re: [PATCH] sched/fair: Prefer idle CPU to cache affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210310055241.GO2028034@linux.vnet.ibm.com>
Date:   Wed, 10 Mar 2021 11:22:41 +0530
From:   Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Valentin Schneider <valentin.schneider@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michael Neuling <mikey@...ling.org>,
        Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
        Parth Shah <parth@...ux.ibm.com>
Subject: Re: [PATCH] sched/fair: Prefer idle CPU to cache affinity

* Vincent Guittot <vincent.guittot@...aro.org> [2021-03-08 14:52:39]:

> On Fri, 26 Feb 2021 at 17:41, Srikar Dronamraju
> <srikar@...ux.vnet.ibm.com> wrote:
> >

Thanks Vincent for your review comments.

> > +static int prefer_idler_llc(int this_cpu, int prev_cpu, int sync)
> > +{
> > +       struct sched_domain_shared *tsds, *psds;
> > +       int pnr_busy, pllc_size, tnr_busy, tllc_size, diff;
> > +
> > +       tsds = rcu_dereference(per_cpu(sd_llc_shared, this_cpu));
> > +       tnr_busy = atomic_read(&tsds->nr_busy_cpus);
> > +       tllc_size = per_cpu(sd_llc_size, this_cpu);
> > +
> > +       psds = rcu_dereference(per_cpu(sd_llc_shared, prev_cpu));
> > +       pnr_busy = atomic_read(&psds->nr_busy_cpus);
> > +       pllc_size = per_cpu(sd_llc_size, prev_cpu);
> > +
> > +       /* No need to compare, if both LLCs are fully loaded */
> > +       if (pnr_busy == pllc_size && tnr_busy == pllc_size)
> > +               return nr_cpumask_bits;
> > +
> > +       if (sched_feat(WA_WAKER) && tnr_busy < tllc_size)
> > +               return this_cpu;
> 
> Why have you chosen to favor this_cpu instead of prev_cpu unlike for wake_idle ?

At this point, we know the waker running on this_cpu and wakee which was
running on prev_cpu are affine to each other and this_cpu and prev_cpu dont
share cache. I chose to move them close to each other to benefit from the
cache sharing. Based on feedback from Peter and Rik, I made the check more
conservative i.e tnr_busy <= tllc_size/smt_weight (where smt_weight is the
cpumask weight of smt domain for this_cpu) i.e if we have a free core in
this llc domain, chose this_cpu.  select_idle_sibling() should pick an idle
cpu/core/smt within the llc domain for this_cpu.

Do you feel, this may not be the correct option?

We are also experimenting with another option, were we call prefer_idler_cpu
after wa_weight. I.e 
1. if wake_affine_weight choses this_cpu but llc in prev_cpu has an idle
smt/CPU but there are no idle smt/CPU in this_cpu, then chose idle smt/CPU
in prev_cpu
2. if wake_affine_weight choses nr_cpumask(aka prev_cpu) but llc in this_cpu
has an idle smt/CPU but there are no idle smt/CPU in prev_cpu, then chose
idle smt/CPU in this_cpu


> > +
> > +       /* For better wakeup latency, prefer idler LLC to cache affinity */
> > +       diff = tnr_busy * pllc_size - sync - pnr_busy * tllc_size;
> > +       if (!diff)
> > +               return nr_cpumask_bits;
> > +       if (diff < 0)
> > +               return this_cpu;
> > +
> > +       return prev_cpu;
> > +}
> > +
> >  static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> >                        int this_cpu, int prev_cpu, int sync)
> >  {
> > @@ -5877,6 +5907,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> >         if (sched_feat(WA_IDLE))
> >                 target = wake_affine_idle(this_cpu, prev_cpu, sync);
> >
> > +       if (sched_feat(WA_IDLER_LLC) && target == nr_cpumask_bits &&
> > +                               !cpus_share_cache(this_cpu, prev_cpu))
> > +               target = prefer_idler_llc(this_cpu, prev_cpu, sync);
> 
> could you use the same naming convention as others function ?
> wake_affine_llc as an example

I guess you meant s/prefer_idler_llc/wake_affine_llc/
Sure. I can modify.

> 
> > +
> >         if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
> >                 target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
> >
> > @@ -5884,8 +5918,11 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> >         if (target == nr_cpumask_bits)
> >                 return prev_cpu;
> >
> > -       schedstat_inc(sd->ttwu_move_affine);
> > -       schedstat_inc(p->se.statistics.nr_wakeups_affine);
> > +       if (target == this_cpu) {
> 
> How is this condition related to $subject ?

Before this change, wake_affine_weight and wake_affine_idle would either
return this_cpu or nr_cpumask_bits. Just before this check, we check if
target is nr_cpumask_bits and return prev_cpu. So the stats were only
incremented when target was this_cpu.

However with prefer_idler_llc, we may return this_cpu, prev_cpu or
nr_cpumask_bits. Now we only to update stats when we have chosen to migrate
the task to this_cpu. Hence I had this check.

If we use the slightly lazier approach which is check for wa_weight first
before wa_idler_llc, then we may not need this change at all.

-- 
Thanks and Regards
Srikar Dronamraju