[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240426105607.GK12673@noisy.programming.kicks-ass.net>
Date: Fri, 26 Apr 2024 12:56:07 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Mike Galbraith <efault@....de>
Cc: K Prateek Nayak <kprateek.nayak@....com>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org, wuyun.abel@...edance.com,
tglx@...utronix.de, Chen Yu <yu.c.chen@...el.com>,
Oliver Sang <oliver.sang@...el.com>
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue
On Thu, Apr 25, 2024 at 01:28:55PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 18, 2024 at 06:24:59PM +0200, Mike Galbraith wrote:
> > The root cause seems to be doing the delay dequeue business on
> > exiting tasks.
>
> > ---
> > kernel/sched/fair.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5374,6 +5374,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> > update_curr(cfs_rq);
> >
> > if (sched_feat(DELAY_DEQUEUE) && sleep &&
> > + !(entity_is_task(se) && (task_of(se)->flags & PF_EXITING)) &&
> > !entity_eligible(cfs_rq, se)) {
> > if (cfs_rq->next == se)
> > cfs_rq->next = NULL;
>
> So I think this can be easier done in dequeue_task_fair(), where we
> still know this is a task.
>
> Perhaps something like (I'll test later):
>
> if (p->flags & PF_EXITING)
> flags &= ~DEQUEUE_SLEEP;
>
> But now I need to go think about the case of removing a cgroup...
> *urgh*.
I ended up with the below instead; lemme go run this unixbench spawn on it.
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 95666034e76c..b5918fa9a0f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8429,7 +8431,20 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
static void task_dead_fair(struct task_struct *p)
{
- remove_entity_load_avg(&p->se);
+ struct sched_entity *se = &p->se;
+
+ if (p->se.sched_delayed) {
+ struct rq_flags rf;
+ struct rq *rq;
+
+ rq = task_rq_lock(p, &rf);
+ update_rq_clock(rq);
+ if (se->sched_delayed)
+ dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
+ task_rq_unlock(rq, p, &rf);
+ }
+
+ remove_entity_load_avg(se);
}
/*
@@ -13089,28 +13104,34 @@ void online_fair_sched_group(struct task_group *tg)
void unregister_fair_sched_group(struct task_group *tg)
{
- unsigned long flags;
- struct rq *rq;
int cpu;
destroy_cfs_bandwidth(tg_cfs_bandwidth(tg));
for_each_possible_cpu(cpu) {
- if (tg->se[cpu])
- remove_entity_load_avg(tg->se[cpu]);
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu];
+ struct sched_entity *se = tg->se[cpu];
+ struct rq *rq = cpu_rq(cpu);
+
+ if (se) {
+ if (se->sched_delayed) {
+ guard(rq_lock_irqsave)(rq);
+ update_rq_clock(rq);
+ if (se->sched_delayed)
+ dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
+ list_del_leaf_cfs_rq(cfs_rq);
+ }
+ remove_entity_load_avg(se);
+ }
/*
* Only empty task groups can be destroyed; so we can speculatively
* check on_list without danger of it being re-added.
*/
- if (!tg->cfs_rq[cpu]->on_list)
- continue;
-
- rq = cpu_rq(cpu);
-
- raw_spin_rq_lock_irqsave(rq, flags);
- list_del_leaf_cfs_rq(tg->cfs_rq[cpu]);
- raw_spin_rq_unlock_irqrestore(rq, flags);
+ if (cfs_rq->on_list) {
+ guard(rq_lock_irqsave)(rq);
+ list_del_leaf_cfs_rq(cfs_rq);
+ }
}
}
Powered by blists - more mailing lists