[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YWRqbKc0nY29/Qiz@geo.homenetwork>
Date: Tue, 12 Oct 2021 00:46:36 +0800
From: Tao Zhou <tao.zhou@...ux.dev>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Tao Zhou <tao.zhou@...ux.dev>
Subject: Re: [PATCH] sched/fair: Use this_sd->weight to calculate span_avg
Hi Steven,
On Mon, Oct 11, 2021 at 10:58:02AM -0400, Steven Rostedt wrote:
> On Sun, 10 Oct 2021 02:10:55 +0800
> Tao Zhou <tao.zhou@...ux.dev> wrote:
>
> > avg_idle, avg_cost got from this_rq and this_sd. I think
> > use this_sd->weight to calculate and estimate the number
> > of loop cpus in the target domain.
>
> If that's the case, then shouldn't the CPUs to be checked come from this_sd
> as well? I mean, at the beginning of the function we have:
>
> this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
> if (!this_sd)
> return -1;
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> Where "cpus" comes from sd, and not this_sd.
Thank you for reply.
Cycles are spent on this CPU(and in this Domain) if I am not wrong.
If T1(on this CPU) want to try to wake up T2 on another CPU, Kernel
(on this CPU) should evaluate that not spend much time in finding
idle siblings. And the avg_idle and span_avg are two fators to compare
with. avg_idle is the CPU average idle time of this rq and avg_cost
is the last time this Domain was being looped and the recorded avg
cost time. Two values are history averaged. So, use the history to
evaluate future that we do *now* the another Domain cpu search to not
let this CPU(and in this Domain) too busy. This is what I want to say.
Not sure yet.
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index f6a05d9b5443..7fab7b70814c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6300,7 +6300,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > avg_idle = this_rq->wake_avg_idle;
> > avg_cost = this_sd->avg_scan_cost + 1;
> >
> > - span_avg = sd->span_weight * avg_idle;
> > + span_avg = this_sd->span_weight * avg_idle;
> > if (span_avg > 4*avg_cost)
> > nr = div_u64(span_avg, avg_cost);
> > else
>
>
> And after this code, the nr that is determined from the above, is for
> limiting the looping over those CPUs from sd, not this_sd:
>
> for_each_cpu_wrap(cpu, cpus, target + 1) {
> if (has_idle_core) {
> i = select_idle_core(p, cpu, cpus, &idle_cpu);
> if ((unsigned int)i < nr_cpumask_bits)
> return i;
>
> } else {
> if (!--nr)
> return -1;
> idle_cpu = __select_idle_cpu(cpu, p);
> if ((unsigned int)idle_cpu < nr_cpumask_bits)
> break;
> }
> }
>
> I'm guessing there's nothing wrong here. But, I don't fully understand it
> myself.
I replied to Barry and not say that I missed that AND operation.
Here is another go.
@cpus is per-cpu and irq disable. I am not sure irq disable can
be preempt in in RT. If that is possiable, cpu_ptr is not safe.
Based on the code and comments in __do_set_cpus_allowed(), he will
not use ->pi_lock to change affinity(eg. SCA_MIGRATE_DISABLE).
If the cpu_ptr not get changed(I must be right and wrong more 5 time)
@core is not in @cpu_ptr mask, BUT, the cpu in smt mask of @core is
the one that we use to call select_idle_core() and @core is in this
smt mask but is not allowed and we get @*idle_cpu and @idle is true
we return the @core. Another not sure.
If the @cpu_ptr can changed in eg. just before the @core loop.
It is possible that @*idle_cpu == -1 and all cpu in smt mask
of @core is seemed to be not allowed. And, we return @core.
This case is what I just look at select_idle_core() semantics
and missed that AND operation.
> -- Steve
Thanks,
Tao
Powered by blists - more mailing lists