lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Dec 2021 18:26:38 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Yihao Wu <wuyihao@...ux.alibaba.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Shanpei Chen <shanpeic@...ux.alibaba.com>,
        王贇 <yun.wang@...ux.alibaba.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: Again ignore percpu threads for imbalance pulls

On 11/12/21 17:48, Yihao Wu wrote:
> commit 2f5f4cce496e ("sched/fair: Ignore percpu threads for imbalance
> pulls") was meant to fix a performance issue, when load balance tries to
> migrate pinned kernel threads at MC domain level. This was destined to
> fail.

> After it fails, it further makes wakeup balance at NUMA domain level
> messed up. The most severe case that I noticed and frequently occurs:
>     |sum_nr_running(node1) - sum_nr_running(node2)| > 100
>

Wakeup balance (aka find_idlest_cpu()) is different from periodic load
balance (aka load_balance()) and doesn't use can_migrate_task(), so the
incriminated commit shouldn't have impacted it (at least not in obvious
ways...). Do you have any more details on that issue?

> However the original bugfix failed, because it covers only case 1) below.
>   1) Created by create_kthread
>   2) Created by kernel_thread
> No kthread is assigned to task_struct in case 2 (Please refer to comments
> in free_kthread_struct) so it simply won't work.
>
> The easist way to cover both cases is to check nr_cpus_allowed, just as
> discussed in the mailing list of the v1 version of the original fix.
>
> * lmbench3.lat_proc -P 104 fork (2 NUMA, and 26 cores, 2 threads)
>

Reasoning about "proper" pcpu kthreads was simpler since they are static,
see 3a7956e25e1d ("kthread: Fix PF_KTHREAD vs to_kthread() race")

>                          w/out patch                 w/ patch
> fork+exit latency            1660 ms                  1520 ms (   8.4%)
>
> Fixes: 2f5f4cce496e ("sched/fair: Ignore percpu threads for imbalance pulls")
> Signed-off-by: Yihao Wu <wuyihao@...ux.alibaba.com>
> ---
>  kernel/kthread.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 4a4d7092a2d8..cb05d3ff2de4 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -543,11 +543,7 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu)
>
>  bool kthread_is_per_cpu(struct task_struct *p)
>  {
> -	struct kthread *kthread = __to_kthread(p);
> -	if (!kthread)
> -		return false;
> -
> -	return test_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
> +	return (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1;
>  }

As Peter said, this is going to cause issues. If you look at
kthread_set_per_cpu(), we also store a CPU value which we expect to be
valid when kthread_is_per_cpu(), which that change is breaking.

AIUI what you want to patch is the actual usage in can_migrate_task()

>
>  /**
> --
> 2.32.0.604.gb1f3e1269

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ