linux-kernel - Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <m2v8s7nzr6.fsf@gmail.com>
Date:   Sat, 09 Jul 2022 03:55:40 +0800
From:   Schspa Shi <schspa@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks


Steven Rostedt <rostedt@...dmis.org> writes:

> On Sat, 09 Jul 2022 03:14:44 +0800
> Schspa Shi <schspa@...il.com> wrote:
>
>> Steven Rostedt <rostedt@...dmis.org> writes:
>> 
>> > On Sat, 09 Jul 2022 02:19:42 +0800
>> > Schspa Shi <schspa@...il.com> wrote:
>> >  
>> >> Yes, it's what I did in the V1 patch.
>> >> Link: https://lore.kernel.org/all/20220623182932.58589-1-schspa@gmail.com/
>> >> 
>> >> But I think it's not the best solution for this problem.
>> >> In these scenarios, we still have a chance to make the task run faster
>> >> by retrying to retry to push the currently running task on this CPU away.
>> >> 
>> >> There is more details on V2 patch's replay message.
>> >> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t  
>> >
>> > The thing is, this situation can only happen if we release the rq lock in
>> > find_lock_lowest_rq(), and we should not be checking for it in the other
>> > cases.
>> >  
>> 
>> If we haven't unlock the rq in find_lock_lowest_rq(), it will return
>> NULL. It won't call this code added.
>> 
>> 	if (unlikely(is_migration_disabled(next_task))) {
>> 		put_task_struct(next_task);
>> 		goto retry;
>> 	}
>
> Because it doesn't need to. If it did not unlock the run queue, there's no
> way that next_task could have run, because we hold the rq lock for
> next_task. Which means that its "migrate_disable" state would not have
> changed from the first time we checked.
>

OK, I get it.

>> 
>> 	deactivate_task(rq, next_task, 0);
>> 	set_task_cpu(next_task, lowest_rq->cpu);
>> 
>> Beside, find_lock_lowest_rq() return NULL doesn't means rq is rleased,
>> We need to add a _find_lock_lowest_rq to get the correct rq released
>> flags?
>
> It it returns NULL it either means that the rq lock was released or that it
> did not find a rq to push to. Which means there's nothing more to do anyway.
>
>> 
>> > Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
>> > case do:
>> >
>> > 		task = pick_next_pushable_task(rq);
>> > 		if (task == next_task) {
>> > +			/*
>> > +			 * If next task has now disabled migrating, see if we
>> > +			 * can push the current task.
>> > +			 */
>> > +			if (unlikely(is_migrate_disabled(task)))
>> > +				goto retry;  
>> 
>> Ahh, It can be added, And do we need this to be a separate PATCH?
>
> Sure.
>
> The "fix" to the crash you see should be in the find_lock_lowest_rq() as I
> suggested. And then you can add this as an optimization.

OK, I will make a V4 patch for this, Please review it then.

>
> -- Steve
>
>> 
>> > 			/*
>> > 			 * The task hasn't migrated, and is still the next
>> > 			 * eligible task, but we failed to find a run-queue
>> > 			 * to push it to.  Do not retry in this case, since
>> > 			 * other CPUs will pull from us when ready.
>> > 			 */
>> > 			goto out;
>> > 		}
>> >
>> > -- Steve  
>> 

-- 
BRs
Schspa Shi