linux-kernel - Re: [PATCH v4 1/3] sched/fair: Ignore percpu threads for imbalance pulls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDtacSrrYaFFx=BsytweW+-6a3NaEn8Ks3WXr=gvqHBBA@mail.gmail.com>
Date:   Fri, 2 Apr 2021 14:52:51 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Lingutla Chandrasekhar <clingutla@...eaurora.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Qais Yousef <qais.yousef@....com>,
        Quentin Perret <qperret@...gle.com>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH v4 1/3] sched/fair: Ignore percpu threads for imbalance pulls

On Thu, 1 Apr 2021 at 21:30, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> From: Lingutla Chandrasekhar <clingutla@...eaurora.org>
>
> During load balance, LBF_SOME_PINNED will bet set if any candidate task
> cannot be detached due to CPU affinity constraints. This can result in
> setting env->sd->parent->sgc->group_imbalance, which can lead to a group
> being classified as group_imbalanced (rather than any of the other, lower
> group_type) when balancing at a higher level.
>
> In workloads involving a single task per CPU, LBF_SOME_PINNED can often be
> set due to per-CPU kthreads being the only other runnable tasks on any
> given rq. This results in changing the group classification during
> load-balance at higher levels when in reality there is nothing that can be
> done for this affinity constraint: per-CPU kthreads, as the name implies,
> don't get to move around (modulo hotplug shenanigans).
>
> It's not as clear for userspace tasks - a task could be in an N-CPU cpuset
> with N-1 offline CPUs, making it an "accidental" per-CPU task rather than
> an intended one. KTHREAD_IS_PER_CPU gives us an indisputable signal which
> we can leverage here to not set LBF_SOME_PINNED.
>
> Note that the aforementioned classification to group_imbalance (when
> nothing can be done) is especially problematic on big.LITTLE systems, which
> have a topology the likes of:
>
>   DIE [          ]
>   MC  [    ][    ]
>        0  1  2  3
>        L  L  B  B
>
>   arch_scale_cpu_capacity(L) < arch_scale_cpu_capacity(B)
>
> Here, setting LBF_SOME_PINNED due to a per-CPU kthread when balancing at MC
> level on CPUs [0-1] will subsequently prevent CPUs [2-3] from classifying
> the [0-1] group as group_misfit_task when balancing at DIE level. Thus, if
> CPUs [0-1] are running CPU-bound (misfit) tasks, ill-timed per-CPU kthreads
> can significantly delay the upgmigration of said misfit tasks. Systems
> relying on ASYM_PACKING are likely to face similar issues.
>
> Signed-off-by: Lingutla Chandrasekhar <clingutla@...eaurora.org>
> [Use kthread_is_per_cpu() rather than p->nr_cpus_allowed]
> [Reword changelog]
> Signed-off-by: Valentin Schneider <valentin.schneider@....com>

Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>

> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6d73bdbb2d40..04d5e14fa261 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7567,6 +7567,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>         if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>                 return 0;
>
> +       /* Disregard pcpu kthreads; they are where they need to be. */
> +       if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
> +               return 0;
> +
>         if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>                 int cpu;
>
> --
> 2.25.1
>