linux-kernel - Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com>
Date:   Fri, 1 Jul 2022 20:18:10 +0800
From:   Schspa Shi <schspa@...il.com>
To:     Valentin Schneider <vschneid@...hat.com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, Benjamin Segall <bsegall@...gle.com>,
        mgorman@...e.de, bristot@...hat.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

Valentin Schneider <vschneid@...hat.com> writes:

> On 27/06/22 23:40, Schspa Shi wrote:
>> @@ -2115,6 +2115,15 @@ static int push_rt_task(struct rq *rq, bool pull)
>>       if (WARN_ON(next_task == rq->curr))
>>               return 0;
>>
>> +    /*
>> +     * It is possible the task has running for a while, we need to check
>> +     * task migration disable flag again. If task migration is disabled,
>> +     * the retry code will retry to push the current running task on this
>> +     * CPU away.
>> +     */
>> +    if (unlikely(is_migration_disabled(next_task)))
>> +            goto retry;
>> +
>
> Can we ever hit this? The previous is_migration_disabled() check is in the
> same rq->lock segment.

Ahh, I'm sorry, I add this to the wrong place, It should be in front of
deactivate_task(rq, next_task, 0);
Sorry for this mistake.

>
> AFAIA this doesn't fix the problem v1 was fixing, which is next_task can
> become migrate_disable() after push_rt_task() goes through
> find_lock_lowest_rq().
>

Something in the following should fix it.

                put_task_struct(next_task);
                next_task = task;
                goto retry;
        }

        if (unlikely(is_migration_disabled(next_task))) {
                put_task_struct(next_task);
                goto retry;
        }

        deactivate_task(rq, next_task, 0);

> For the task to still be in the pushable_tasks list after having made
> itself migration disabled, it must no longer be current, which means we
> enqueued a higher priority RT task, in which case we went through
> set_next_task_rt() so we did rt_queue_push_tasks().

The current task may not have a higher priority, maybe a process of
the same priority preempted the migration disabled task.

In this case, we still have the opportunity to make this migration
disabled task execute faster by migrating the higher priority task
to other CPUs. And this is what the commit
   95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
and
   1beec5b55060 ("sched: Fix migrate_disable() vs rt/dl balancing")
doing.

Considering this, the V1 patch is not the best solution, and I send
this V2 patch (although there is a misplaced bug here).

Or can we ignore this small possibility?

>
> So I think what you had in v1 was actually what we needed.
>

Yes, v1 is the patch I have tested for a week, V2 hasn't done this
long time.


>>       /* We might release rq lock */
>>       get_task_struct(next_task);
>>
>> --
>> 2.24.3 (Apple Git-128)

-- 
Schspa Shi
BRs