linux-kernel - Re: [PATCH] sched/fair: Don't balance migration disabled tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <xhsmh4jo3qngr.mognet@vschneid.remote.csb>
Date:   Tue, 23 May 2023 12:47:00 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Yicong Yang <yangyicong@...wei.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, linux-kernel@...r.kernel.org
Cc:     yangyicong@...ilicon.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, linuxarm@...wei.com, prime.zeng@...wei.com,
        wangjie125@...wei.com
Subject: Re: [PATCH] sched/fair: Don't balance migration disabled tasks

On 16/05/23 19:10, Yicong Yang wrote:
> Hi Valentin,
> Sorry for the late reply. Yes it can be reproduced on the upstream kernel (tested below on
> 6.4-rc1). Since it happens occasionally with the normal setup, I wrote a test kthread
> with migration enable/disable periodically:
>
> static int workload_func(void *data)
> {
>       cpumask_var_t cpumask;
>       int i;
>
>       if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>               return -ENOMEM;
>
>       for (i = 0; i < 8; i++)
>               cpumask_set_cpu(i, cpumask);
>
>       set_cpus_allowed_ptr(current, cpumask);
>       free_cpumask_var(cpumask);
>
>       while (!kthread_should_stop()) {
>               migrate_disable();
>               mdelay(1000);
>               cond_resched();
>               migrate_enable();
>               mdelay(1000);
>       }
>
>       return -1;
> }
>
> Launching this and bind another workload to the same CPU it's currently running like
> `taskset -c $workload_cpu stress-ng -c 1` will trigger the issue. In fact, the problem
> is not because of the migration disable mechanism which works well, but because the
> balancing policy after found all the tasks on the source CPU are pinned. With below
> debug print added:
>
> @@ -8527,6 +8527,20 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>         if (kthread_is_per_cpu(p))
>                 return 0;
>
> +       if (is_migration_disabled(p)) {
> +               if (!p->on_cpu && cpumask_test_cpu(env->dst_cpu, p->cpus_ptr))
> +                       pr_err("dst_cpu %d on_cpu %d cpus_ptr %*pbl cpus_mask %*pbl",
> +                               env->dst_cpu, p->on_cpu, cpumask_pr_args(p->cpus_ptr),
> +                               cpumask_pr_args(&p->cpus_mask));
> +       }
> +
>         if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>                 int cpu;
>
> I got below output:
>
> [  686.135619] dst_cpu 1 on_cpu 0 cpus_ptr 1 cpus_mask 0-7
> [  686.148809] ------------[ cut here ]------------
> [  686.169505] WARNING: CPU: 64 PID: 0 at kernel/sched/core.c:3210 set_task_cpu+0x190/0x250
> [  686.186537] Modules linked in: kthread_workload(O) bluetooth rfkill xt_CHECKSUM iptable_mangle xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi crct10dif_ce hns_roce_hw_v2 arm_spe_pmu sbsa_gwdt sm4_generic sm4 xts ecb hisi_hpre hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_trng_v2 rng_core hisi_uncore_pmu spi_dw_mmio hisi_zip hisi_sec2 hisi_qm uacce hclge hns3 hnae3 hisi_sas_v3_hw hisi_sas_main libsas [last unloaded: kthread_workload(O)]
> [  686.293937] CPU: 64 PID: 0 Comm: swapper/64 Tainted: G           O       6.4.0-rc1-migration-race-debug+ #24
> [  686.314616] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B211.01 11/10/2021
> [  686.333285] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  686.347930] pc : set_task_cpu+0x190/0x250
> [...]
>
> It shows that we're going to balance the task to its current CPU (CPU 1) rather than
> the balancer CPU (CPU 64). It's because we're going to find a new dst_cpu if the task
> on the src_cpu is pinned, and the new_dst_cpu happens to be the task's current CPU.
>

Nicely found! Thanks for having spent time on this. I haven't been able to
retrigger the issue using your reproducer, but I think you have indeed
found the root cause of it.

> So the right way to solve this maybe avoid selecting the src_cpu as the new_dst_cpu and
> below patch works to solve this issue.
>
> @@ -8550,7 +8564,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>
>                 /* Prevent to re-select dst_cpu via env's CPUs: */
>                 for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
> -                       if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +                       if (cpumask_test_cpu(cpu, p->cpus_ptr) && cpu != env->src_cpu) {
>                                 env->flags |= LBF_DST_PINNED;
>                                 env->new_dst_cpu = cpu;
>                                 break;
>

Other than having some better inplace cpumask helpers, I don't think we can
make this look better. Could you send this change as a proper patch, please?

> Thanks,
> Yicong