[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAnzhDKXayicDdymWpK1UswfkTaO8vL-WHxVaoj7DaCFw@mail.gmail.com>
Date: Fri, 22 Jan 2021 17:56:22 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Paul McKenney <paulmck@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Dietmar Eggeman <dietmar.eggemann@....com>,
Qais Yousef <qais.yousef@....com>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages()
for NOHZ
On Fri, 22 Jan 2021 at 16:46, Joel Fernandes (Google)
<joel@...lfernandes.org> wrote:
>
> On an octacore ARM64 device running ChromeOS Linux kernel v5.4, I found
> that there are a lot of calls to update_blocked_averages(). This causes
> the schedule loop to slow down to taking upto 500 micro seconds at
> times (due to newidle load balance). I have also seen this manifest in
> the periodic balancer.
>
> Closer look shows that the problem is caused by the following
> ingredients:
> 1. If the system has a lot of inactive CGroups (thanks Dietmar for
> suggesting to inspect /proc/sched_debug for this), this can make
> __update_blocked_fair() take a long time.
Inactive cgroups are removed from the list so they should not impact
the duration
>
> 2. The device has a lot of CPUs in a cluster which causes schedutil in a
> shared frequency domain configuration to be slower than usual. (the load
What do you mean exactly by it causes schedutil to be slower than usual ?
> average updates also try to update the frequency in schedutil).
>
> 3. The CPU is running at a low frequency causing the scheduler/schedutil
> code paths to take longer than when running at a high CPU frequency.
Low frequency usually means low utilization so it should happen that much.
>
> The fix is simply rate limit the calls to update_blocked_averages to 20
> times per second. It appears that updating the blocked average less
> often is sufficient. Currently I see about 200 calls per second
Would be good to explain why updating less often is sufficient ?
> sometimes, which seems overkill.
>
> schbench shows a clear improvement with the change:
Have you got more details about your test setup ?
which platform ?
which kernel ?
>
> Without patch:
> ~/schbench -t 2 -m 2 -r 5
> Latency percentiles (usec) runtime 5 (s) (212 total samples)
> 50.0th: 210 (106 samples)
> 75.0th: 619 (53 samples)
> 90.0th: 665 (32 samples)
> 95.0th: 703 (11 samples)
> *99.0th: 12656 (8 samples)
> 99.5th: 12784 (1 samples)
> 99.9th: 13424 (1 samples)
> min=15, max=13424
>
> With patch:
> ~/schbench -t 2 -m 2 -r 5
> Latency percentiles (usec) runtime 5 (s) (214 total samples)
> 50.0th: 188 (108 samples)
> 75.0th: 238 (53 samples)
> 90.0th: 623 (32 samples)
> 95.0th: 657 (12 samples)
> *99.0th: 717 (7 samples)
> 99.5th: 725 (2 samples)
> 99.9th: 725 (0 samples)
>
> Cc: Paul McKenney <paulmck@...nel.org>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Suggested-by: Dietmar Eggeman <dietmar.eggemann@....com>
> Co-developed-by: Qais Yousef <qais.yousef@....com>
> Signed-off-by: Qais Yousef <qais.yousef@....com>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce20da67..fe2dc0024db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8381,7 +8381,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
> if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
> return false;
>
> - if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
> + if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick + (HZ/20)))
This condition is there to make sure to update blocked load at most
once a tick in order to filter newly idle case otherwise the rate
limit is already done by load balance interval
This hard coded (HZ/20) looks really like an ugly hack
> return true;
>
> update_blocked_averages(cpu);
> --
> 2.30.0.280.ga3ce27912f-goog
>
Powered by blists - more mailing lists