[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YY6GfilrilzTmhZx@hirez.programming.kicks-ass.net>
Date: Fri, 12 Nov 2021 16:21:34 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, linux-kernel@...r.kernel.org,
tim.c.chen@...ux.intel.com, joel@...lfernandes.org
Subject: Re: [PATCH 2/2] sched: sched: Fix rq->next_balance time updated to
earlier than current time
On Fri, Nov 12, 2021 at 11:04:58AM +0100, Vincent Guittot wrote:
> From: Tim Chen <tim.c.chen@...ux.intel.com>
>
> In traces on newidle_balance(), this_rq->next_balance
> time goes backward and earlier than current time jiffies, e.g.
>
> 11.602 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb76c jiffies=0x1004fb739
> 11.624 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb731 jiffies=0x1004fb739
> 13.856 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb76c jiffies=0x1004fb73b
> 13.910 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb731 jiffies=0x1004fb73b
> 14.637 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb76c jiffies=0x1004fb73c
> 14.666 ( ): probe:newidle_balance:(ffffffff810d2470) this_rq=0xffff88fe7f8aae00 next_balance=0x1004fb731 jiffies=0x1004fb73c
No explanation of what these numbers mean, or where they're taken from.
> It doesn't make sense to have a next_balance in the past.
> Fix newidle_balance() and update_next_balance() so the next
> balance time is at least jiffies+1.
The changelog is deficient in that it doesn't explain how the times end
up in the past, therefore we cannot evaluate if the provided solution is
sufficient etc..
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
> kernel/sched/fair.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a162b0ec8963..1050037578a9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10138,7 +10138,10 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
>
> /* used by idle balance, so cpu_busy = 0 */
> interval = get_sd_balance_interval(sd, 0);
> - next = sd->last_balance + interval;
> + if (time_after(jiffies+1, sd->last_balance + interval))
> + next = jiffies+1;
> + else
> + next = sd->last_balance + interval;
>
> if (time_after(*next_balance, next))
> *next_balance = next;
> @@ -10974,6 +10977,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>
> out:
> /* Move the next balance forward */
> + if (time_after(jiffies+1, this_rq->next_balance))
> + this_rq->next_balance = jiffies+1;
jiffies roll over here..
Also, what's the point of the update_next_balance() addition in the face
of this one? AFAICT this hunk completely renders the other hunk useless.
> if (time_after(this_rq->next_balance, next_balance))
> this_rq->next_balance = next_balance;
and you've violated your own premise :-)
Now, this pattern is repeated throughout, if it's a problem here, why
isn't it a problem in say rebalance_domains() ?
Can we please unify the code across sites instead of growing different
hacks in different places?
Powered by blists - more mailing lists