linux-kernel - Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhh7415e12.mognet@vschneid.remote.csb>
Date:   Fri, 01 Jul 2022 11:21:45 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Schspa Shi <schspa@...il.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com
Cc:     linux-kernel@...r.kernel.org, zhaohui.shi@...izon.ai,
        Schspa Shi <schspa@...il.com>
Subject: Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

On 27/06/22 23:40, Schspa Shi wrote:
> @@ -2115,6 +2115,15 @@ static int push_rt_task(struct rq *rq, bool pull)
>       if (WARN_ON(next_task == rq->curr))
>               return 0;
>
> +	/*
> +	 * It is possible the task has running for a while, we need to check
> +	 * task migration disable flag again. If task migration is disabled,
> +	 * the retry code will retry to push the current running task on this
> +	 * CPU away.
> +	 */
> +	if (unlikely(is_migration_disabled(next_task)))
> +		goto retry;
> +

Can we ever hit this? The previous is_migration_disabled() check is in the
same rq->lock segment.

AFAIA this doesn't fix the problem v1 was fixing, which is next_task can
become migrate_disable() after push_rt_task() goes through
find_lock_lowest_rq().

For the task to still be in the pushable_tasks list after having made
itself migration disabled, it must no longer be current, which means we
enqueued a higher priority RT task, in which case we went through
set_next_task_rt() so we did rt_queue_push_tasks().

So I think what you had in v1 was actually what we needed.

>       /* We might release rq lock */
>       get_task_struct(next_task);
>
> --
> 2.24.3 (Apple Git-128)