[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDDSzLi7PEJkBqepx9cRgmbBKy2ZXJuT0h62e3RkQBoYw@mail.gmail.com>
Date: Mon, 24 Jun 2024 11:18:59 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Josh Don <joshdon@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Revert "sched/fair: Make sure to try to detach at least
one movable task"
On Thu, 20 Jun 2024 at 23:45, Josh Don <joshdon@...gle.com> wrote:
>
> This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.
>
> b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
> all tasks examined to that point were pinned. The goal of the patch was
> to make it more likely to be able to detach a task buried in a long list
> of pinned tasks. However, this has the unfortunate side effect of
> creating an O(n) iteration in detach_tasks(), as we now must fully
> iterate every task on a cpu if all or most are pinned. Since this load
> balance code is done with rq lock held, and often in softirq context, it
> is very easy to trigger hard lockups. We observed such hard lockups with
> a user who affined O(10k) threads to a single cpu.
>
> When I discussed this with Vincent he initially suggested that we keep
> the limit on the number of tasks to detach, but increase the number of
> tasks we can search. However, after some back and forth on the mailing
> list, he recommended we instead revert the original patch, as it seems
> likely no one was actually getting hit by the original issue.
>
Maybe add a
Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least
one movable task")
> Signed-off-by: Josh Don <joshdon@...gle.com>
Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
> kernel/sched/fair.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 34fe6e9490c2..a5416798702b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9043,12 +9043,8 @@ static int detach_tasks(struct lb_env *env)
> break;
>
> env->loop++;
> - /*
> - * We've more or less seen every task there is, call it quits
> - * unless we haven't found any movable task yet.
> - */
> - if (env->loop > env->loop_max &&
> - !(env->flags & LBF_ALL_PINNED))
> + /* We've more or less seen every task there is, call it quits */
> + if (env->loop > env->loop_max)
> break;
>
> /* take a breather every nr_migrate tasks */
> @@ -11328,9 +11324,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>
> if (env.flags & LBF_NEED_BREAK) {
> env.flags &= ~LBF_NEED_BREAK;
> - /* Stop if we tried all running tasks */
> - if (env.loop < busiest->nr_running)
> - goto more_balance;
> + goto more_balance;
> }
>
> /*
> --
> 2.45.2.741.gdbec12cfda-goog
>
Powered by blists - more mailing lists