lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Jun 2024 11:18:59 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Josh Don <joshdon@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Revert "sched/fair: Make sure to try to detach at least
 one movable task"

On Thu, 20 Jun 2024 at 23:45, Josh Don <joshdon@...gle.com> wrote:
>
> This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.
>
> b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
> all tasks examined to that point were pinned. The goal of the patch was
> to make it more likely to be able to detach a task buried in a long list
> of pinned tasks. However, this has the unfortunate side effect of
> creating an O(n) iteration in detach_tasks(), as we now must fully
> iterate every task on a cpu if all or most are pinned. Since this load
> balance code is done with rq lock held, and often in softirq context, it
> is very easy to trigger hard lockups. We observed such hard lockups with
> a user who affined O(10k) threads to a single cpu.
>
> When I discussed this with Vincent he initially suggested that we keep
> the limit on the number of tasks to detach, but increase the number of
> tasks we can search. However, after some back and forth on the mailing
> list, he recommended we instead revert the original patch, as it seems
> likely no one was actually getting hit by the original issue.
>

Maybe add a
Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least
one movable task")

> Signed-off-by: Josh Don <joshdon@...gle.com>

Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>

> ---
>  kernel/sched/fair.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 34fe6e9490c2..a5416798702b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9043,12 +9043,8 @@ static int detach_tasks(struct lb_env *env)
>                         break;
>
>                 env->loop++;
> -               /*
> -                * We've more or less seen every task there is, call it quits
> -                * unless we haven't found any movable task yet.
> -                */
> -               if (env->loop > env->loop_max &&
> -                   !(env->flags & LBF_ALL_PINNED))
> +               /* We've more or less seen every task there is, call it quits */
> +               if (env->loop > env->loop_max)
>                         break;
>
>                 /* take a breather every nr_migrate tasks */
> @@ -11328,9 +11324,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>
>                 if (env.flags & LBF_NEED_BREAK) {
>                         env.flags &= ~LBF_NEED_BREAK;
> -                       /* Stop if we tried all running tasks */
> -                       if (env.loop < busiest->nr_running)
> -                               goto more_balance;
> +                       goto more_balance;
>                 }
>
>                 /*
> --
> 2.45.2.741.gdbec12cfda-goog
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ