[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhh6bty6wl.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Fri, 09 Aug 2024 18:53:30 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org
Cc: kprateek.nayak@....com, wuyun.abel@...edance.com,
youssefesmat@...omium.org, tglx@...utronix.de, efault@....de
Subject: Re: [PATCH 07/24] sched/fair: Re-organize dequeue_task_fair()
On 27/07/24 12:27, Peter Zijlstra wrote:
> Working towards delaying dequeue, notably also inside the hierachy,
> rework dequeue_task_fair() such that it can 'resume' an interrupted
> hierarchy walk.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> kernel/sched/fair.c | 61 ++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 40 insertions(+), 21 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6861,34 +6861,43 @@ enqueue_task_fair(struct rq *rq, struct
> static void set_next_buddy(struct sched_entity *se);
>
> /*
> - * The dequeue_task method is called before nr_running is
> - * decreased. We remove the task from the rbtree and
> - * update the fair scheduling stats:
> + * Basically dequeue_task_fair(), except it can deal with dequeue_entity()
> + * failing half-way through and resume the dequeue later.
> + *
> + * Returns:
> + * -1 - dequeue delayed
> + * 0 - dequeue throttled
> + * 1 - dequeue complete
> */
> -static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> +static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
> {
> - struct cfs_rq *cfs_rq;
> - struct sched_entity *se = &p->se;
> - int task_sleep = flags & DEQUEUE_SLEEP;
> - int idle_h_nr_running = task_has_idle_policy(p);
> bool was_sched_idle = sched_idle_rq(rq);
> int rq_h_nr_running = rq->cfs.h_nr_running;
> + bool task_sleep = flags & DEQUEUE_SLEEP;
> + struct task_struct *p = NULL;
> + int idle_h_nr_running = 0;
> + int h_nr_running = 0;
> + struct cfs_rq *cfs_rq;
>
> - util_est_dequeue(&rq->cfs, p);
> + if (entity_is_task(se)) {
> + p = task_of(se);
> + h_nr_running = 1;
> + idle_h_nr_running = task_has_idle_policy(p);
> + }
>
This leaves the *h_nr_running to 0 for non-task entities. IIUC this makes
sense for ->sched_delayed entities (they should be empty of tasks), not so
sure for the other case. However, this only ends up being used for non-task
entities in:
- pick_next_entity(), if se->sched_delayed
- unregister_fair_sched_group()
IIRC unregister_fair_sched_group() can only happen after the group has been
drained, so it would then indeed be empty of tasks, but I reckon this could
do with a comment/assert in dequeue_entities(), no? Or did I get too
confused by cgroups again?
Powered by blists - more mailing lists