[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211028121530.GA19512@vingu-book>
Date: Thu, 28 Oct 2021 14:15:30 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/5] Improve newidle lb cost tracking and early abort
Le mercredi 27 oct. 2021 à 13:53:32 (-0700), Tim Chen a écrit :
> On Wed, 2021-10-27 at 10:49 +0200, Vincent Guittot wrote:
> >
> > > Looking at the profile on update_blocked_averages a bit more,
> > > the majority of the call to update_blocked_averages
> > > happens in run_rebalance_domain. And we are not
> > > including that cost of update_blocked_averages for
> > > run_rebalance_domains in our current patch set. I think
> > > the patch set should account for that too.
> >
> > nohz_newidle_balance keeps using sysctl_sched_migration_cost to
> > trigger a _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
> > This would probably benefit to take into account the cost of
> > update_blocked_averages instead
> >
>
> For the case where
>
> this_rq->avg_idle < sysctl_sched_migration_cost
>
> in newidle_balance(), we skip to the out: label
>
> out:
> /* Move the next balance forward */
> if (time_after(this_rq->next_balance, next_balance))
> this_rq->next_balance = next_balance;
>
> if (pulled_task)
> this_rq->idle_stamp = 0;
> else
> nohz_newidle_balance(this_rq);
>
> and we call nohz_newidle_balance as we don't have a pulled_task.
>
> It seems to make sense to skip the call
> to nohz_newidle_balance() for this case?
nohz_newidle_balance() also tests this condition :
(this_rq->avg_idle < sysctl_sched_migration_cost)
and doesn't set NOHZ_NEWILB_KICKi in such case
But this patch now used the condition :
this_rq->avg_idle < sd->max_newidle_lb_cost
and sd->max_newidle_lb_cost can be higher than sysctl_sched_migration_cost
which means that we can set NOHZ_NEWILB_KICK:
-although we decided to skip newidle loop
-or when we abort because this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost
This is even more true when sysctl_sched_migration_cost is lowered which is your case IIRC
The patch below ensures that we don't set NOHZ_NEWILB_KICK in such cases:
---
kernel/sched/fair.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c19f4bb3df1a..36ddae208959 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10779,7 +10779,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
int this_cpu = this_rq->cpu;
u64 t0, t1, curr_cost = 0;
struct sched_domain *sd;
- int pulled_task = 0;
+ int pulled_task = 0, early_stop = 0;
update_misfit_status(NULL, this_rq);
@@ -10816,8 +10816,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (!READ_ONCE(this_rq->rd->overload) ||
(sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
- if (sd)
+ if (sd) {
update_next_balance(sd, &next_balance);
+
+ /*
+ * We skip new idle LB because there is not enough
+ * time before next wake up. Make sure that we will
+ * not kick NOHZ_NEWILB_KICK
+ */
+ early_stop = 1;
+ }
rcu_read_unlock();
goto out;
@@ -10836,8 +10844,10 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
update_next_balance(sd, &next_balance);
- if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
+ if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
+ early_stop = 1;
break;
+ }
if (sd->flags & SD_BALANCE_NEWIDLE) {
@@ -10887,7 +10897,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (pulled_task)
this_rq->idle_stamp = 0;
- else
+ else if (!early_stop)
nohz_newidle_balance(this_rq);
rq_repin_lock(this_rq, rf);
--
> We expect a very short idle and a task to wake shortly.
> So we do not have to pull a task
> to this idle cpu and incur the migration cost.
>
> Tim
>
>
>
>
Powered by blists - more mailing lists