[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210310144453.u756vzktfdd3vxmy@e107158-lin.cambridge.arm.com>
Date: Wed, 10 Mar 2021 14:44:53 +0000
From: Qais Yousef <qais.yousef@....com>
To: Valentin Schneider <valentin.schneider@....com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>
Cc: tglx@...utronix.de, mingo@...nel.org, linux-kernel@...r.kernel.org,
bigeasy@...utronix.de, swood@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vincent.donnefort@....com, tj@...nel.org,
ouwen210@...mail.com
Subject: Re: [PATCH v4 15/19] sched: Fix migrate_disable() vs rt/dl balancing
On 03/05/21 15:41, Valentin Schneider wrote:
> On 05/03/21 15:56, Peter Zijlstra wrote:
> > On Sat, Dec 26, 2020 at 01:54:45PM +0000, Qais Yousef wrote:
> >>
> >> > +static inline struct task_struct *get_push_task(struct rq *rq)
> >> > +{
> >> > + struct task_struct *p = rq->curr;
> >>
> >> Shouldn't we verify the class of the task here? The RT task in migration
> >> disabled could have been preempted by a dl or stopper task. Similarly, the dl
> >> task could have been preempted by a stopper task.
> >>
> >> I don't think an RT task should be allowed to push a dl task under any
> >> circumstances?
> >
> > Hmm, quite. Fancy doing a patch?
>
> Last time we talked about this, I looked into
>
> push_rt_task() + find_lowest_rq()
>
> IIRC, with how
>
> find_lowest_rq() + cpupri_find_fitness()
>
> currently work, find_lowest_rq() should return -1 in push_rt_task() if
> rq->curr is DL (CPUPRI_INVALID). IOW, Migration-Disabled RT tasks shouldn't
> actually interfere with DL tasks (unless a DL task gets scheduled after we
> drop the rq lock and kick the stopper, but we have that problem everywhere
> including CFS active balance).
This makes it less of a problem true, but AFAICT this can still happen in the
pull path.
Anyways, here's the patch as extra bolts and braces to be considered.
Thanks
--
Qais Yousef
--->8----
>From 2df733d381f636cc185944c7eda86c824a9a785e Mon Sep 17 00:00:00 2001
From: Qais Yousef <qais.yousef@....com>
Date: Tue, 12 Jan 2021 11:54:16 +0000
Subject: [PATCH] sched: Don't push a higher priority class in get_push_task()
Commit a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
will attempt to push/pull a higher priority task if the candidate task
is in migrate_disable() section. This is an attempt to prevent
starvation of these lower priority task that, in theory at least, could
end up in a situation where they're forever in migrate disable section
with no CPU time to run.
One issue with that is get_push_task() assumes rq->curr is of the same
sched_class, which AFAICT is not guaranteed to be true.
This patch adds extra bolts and braces to ensure that this voluntary
push operation is performed on a task of the same scheduling class only.
Otherwise an RT task could end up causing a DL task to be pushed away.
Which breaks the strict priority between sched classes.
We could also end up trying to push the migration task. Which I think is
harmless and is nothing but a wasted effort.
Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Qais Yousef <qais.yousef@....com>
---
kernel/sched/deadline.c | 2 +-
kernel/sched/rt.c | 4 ++--
kernel/sched/sched.h | 17 ++++++++++++++++-
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aac3539aa0fe..afadc7e1f968 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2276,7 +2276,7 @@ static void pull_dl_task(struct rq *this_rq)
goto skip;
if (is_migration_disabled(p)) {
- push_task = get_push_task(src_rq);
+ push_task = get_push_task(src_rq, SCHED_DEADLINE);
} else {
deactivate_task(src_rq, p, 0);
set_task_cpu(p, this_cpu);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8f720b71d13d..c2c5c08e3030 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1892,7 +1892,7 @@ static int push_rt_task(struct rq *rq, bool pull)
* to this other CPU, instead attempt to push the current
* running task on this CPU away.
*/
- push_task = get_push_task(rq);
+ push_task = get_push_task(rq, SCHED_FIFO);
if (push_task) {
raw_spin_unlock(&rq->lock);
stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
@@ -2225,7 +2225,7 @@ static void pull_rt_task(struct rq *this_rq)
goto skip;
if (is_migration_disabled(p)) {
- push_task = get_push_task(src_rq);
+ push_task = get_push_task(src_rq, SCHED_FIFO);
} else {
deactivate_task(src_rq, p, 0);
set_task_cpu(p, this_cpu);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522b1e30..4e156f008d22 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1954,12 +1954,27 @@ extern void trigger_load_balance(struct rq *rq);
extern void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask, u32 flags);
-static inline struct task_struct *get_push_task(struct rq *rq)
+static inline struct task_struct *get_push_task(struct rq *rq, int policy)
{
struct task_struct *p = rq->curr;
lockdep_assert_held(&rq->lock);
+ switch(policy) {
+ case SCHED_FIFO:
+ case SCHED_RR:
+ if (!rt_task(p))
+ return NULL;
+ break;
+ case SCHED_DEADLINE:
+ if (!dl_task(p))
+ return NULL;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return NULL;
+ }
+
if (rq->push_busy)
return NULL;
--
2.25.1
Powered by blists - more mailing lists