[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220708154111.36e662b2@gandalf.local.home>
Date: Fri, 8 Jul 2022 15:41:11 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Schspa Shi <schspa@...il.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks
On Sat, 09 Jul 2022 03:14:44 +0800
Schspa Shi <schspa@...il.com> wrote:
> Steven Rostedt <rostedt@...dmis.org> writes:
>
> > On Sat, 09 Jul 2022 02:19:42 +0800
> > Schspa Shi <schspa@...il.com> wrote:
> >
> >> Yes, it's what I did in the V1 patch.
> >> Link: https://lore.kernel.org/all/20220623182932.58589-1-schspa@gmail.com/
> >>
> >> But I think it's not the best solution for this problem.
> >> In these scenarios, we still have a chance to make the task run faster
> >> by retrying to retry to push the currently running task on this CPU away.
> >>
> >> There is more details on V2 patch's replay message.
> >> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t
> >
> > The thing is, this situation can only happen if we release the rq lock in
> > find_lock_lowest_rq(), and we should not be checking for it in the other
> > cases.
> >
>
> If we haven't unlock the rq in find_lock_lowest_rq(), it will return
> NULL. It won't call this code added.
>
> if (unlikely(is_migration_disabled(next_task))) {
> put_task_struct(next_task);
> goto retry;
> }
Because it doesn't need to. If it did not unlock the run queue, there's no
way that next_task could have run, because we hold the rq lock for
next_task. Which means that its "migrate_disable" state would not have
changed from the first time we checked.
>
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
>
> Beside, find_lock_lowest_rq() return NULL doesn't means rq is rleased,
> We need to add a _find_lock_lowest_rq to get the correct rq released
> flags?
It it returns NULL it either means that the rq lock was released or that it
did not find a rq to push to. Which means there's nothing more to do anyway.
>
> > Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
> > case do:
> >
> > task = pick_next_pushable_task(rq);
> > if (task == next_task) {
> > + /*
> > + * If next task has now disabled migrating, see if we
> > + * can push the current task.
> > + */
> > + if (unlikely(is_migrate_disabled(task)))
> > + goto retry;
>
> Ahh, It can be added, And do we need this to be a separate PATCH?
Sure.
The "fix" to the crash you see should be in the find_lock_lowest_rq() as I
suggested. And then you can add this as an optimization.
-- Steve
>
> > /*
> > * The task hasn't migrated, and is still the next
> > * eligible task, but we failed to find a run-queue
> > * to push it to. Do not retry in this case, since
> > * other CPUs will pull from us when ready.
> > */
> > goto out;
> > }
> >
> > -- Steve
>
Powered by blists - more mailing lists