linux-kernel - Re: [PATCH v8 1/2] sched/rt: fix bad task migration for rt tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhile99qdo.mognet@vschneid.remote.csb>
Date:   Thu, 06 Apr 2023 12:55:31 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Schspa Shi <schspa@...il.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com
Cc:     linux-kernel@...r.kernel.org, zhaohui.shi@...izon.ai,
        Schspa Shi <schspa@...il.com>
Subject: Re: [PATCH v8 1/2] sched/rt: fix bad task migration for rt tasks

On 29/08/22 01:03, Schspa Shi wrote:
> Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
> allow find_lock_lowest_rq to pick a task with migration disabled.
> This commit is intended to push the current running task on this CPU
> away.
>
> There is a race scenario, which allows a migration disabled task to
> be migrated to another CPU.
>
> When there is a RT task with higher priority, rt sched class was
> intended to migrate higher priority task to lowest rq via push_rt_tasks,
> this WARNING will happen here.
>
> With the system running on PREEMPT_RT, rt_spin_lock will disable
> migration, this will make the problem easier to reproduce.
>
> I have seen this WARNING on PREEMPT_RT, from the logs, there is a race
> when trying to migrate higher priority tasks to the lowest rq.
>
> Please refer to the following scenarios.
>
>            CPU0                                  CPU1
> ------------------------------------------------------------------
> push_rt_task
>   check is_migration_disabled(next_task)
>                                         task not running and
>                                         migration_disabled == 0
>   find_lock_lowest_rq(next_task, rq);
>     _double_lock_balance(this_rq, busiest);
>       raw_spin_rq_unlock(this_rq);
>       double_rq_lock(this_rq, busiest);
>         <<wait for busiest rq>>
>                                             <wakeup>
>                                         task become running
>                                         migrate_disable();
>                                           <context out>
>   deactivate_task(rq, next_task, 0);
>   set_task_cpu(next_task, lowest_rq->cpu);
>     WARN_ON_ONCE(is_migration_disabled(p));
>       ---------OOPS-------------
>
> Crash logs are as follows:
> [123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
> set_task_cpu+0x8c/0x108
> [123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> [123671.996811] pc : set_task_cpu+0x8c/0x108
> [123671.996820] lr : set_task_cpu+0x7c/0x108
> [123671.996828] sp : ffff80001268bd30
> [123671.996832] pmr_save: 00000060
> [123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
> [123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
> [123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
> [123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
> [123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
> [123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
> [123671.996890] x17: 0000000000000000 x16: 0000000000000000
> [123671.996899] x15: 000000000000000a x14: 000000000000349e
> [123671.996908] x13: ffff800012f4503d x12: 0000000000000001
> [123671.996916] x11: 0000000000000000 x10: 0000000000000000
> [123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
> [123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
> [123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
> [123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
> [123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
> [123671.996969] Call trace:
> [123671.996975]  set_task_cpu+0x8c/0x108
> [123671.996984]  push_rt_task.part.0+0x144/0x184
> [123671.996995]  push_rt_tasks+0x28/0x3c
> [123671.997002]  task_woken_rt+0x58/0x68
> [123671.997009]  ttwu_do_wakeup+0x5c/0xd0
> [123671.997019]  ttwu_do_activate+0xc0/0xd4
> [123671.997028]  try_to_wake_up+0x244/0x288
> [123671.997036]  wake_up_process+0x18/0x24
> [123671.997045]  __irq_wake_thread+0x64/0x80
> [123671.997056]  __handle_irq_event_percpu+0x110/0x124
> [123671.997064]  handle_irq_event_percpu+0x50/0xac
> [123671.997072]  handle_irq_event+0x84/0xfc
>
> To fix it, we need to check migration_disabled flag again to avoid
> bad migration.
>
> Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
>
> CC: Valentin Schneider <vschneid@...hat.com>
> Signed-off-by: Schspa Shi <schspa@...il.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>
>

Sorry, I lost track of that one, and ironically it looks like we've hit
this bug internally. I'm going to test whether this is the cure we need,
but even if this isn't the same issue, that patch looks good:

Reviewed-by: Valentin Schneider <vschneid@...hat.com>

I have some more comments on 2/2, but IMO this can go in on its own.