[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241129161756.3081386-1-vincent.guittot@linaro.org>
Date: Fri, 29 Nov 2024 17:17:46 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: mingo@...hat.com,
peterz@...radead.org,
juri.lelli@...hat.com,
dietmar.eggemann@....com,
rostedt@...dmis.org,
bsegall@...gle.com,
mgorman@...e.de,
vschneid@...hat.com,
linux-kernel@...r.kernel.org
Cc: kprateek.nayak@....com,
pauld@...hat.com,
efault@....de,
luis.machado@....com,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: [PATCH 0/10 v2] sched/fair: Fix statistics with delayed dequeue
Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its
lag has elapsed. As a result, it stays also visible in the statistics that
are used to balance the system and in particular the field h_nr_running.
This serie fixes those metrics by creating a new h_nr_queued that tracks
all queued tasks. It renames h_nr_running into h_nr_runnable and restores
the behavior of h_nr_running i.e. tracking the number of fair tasks that
want to run.
h_nr_runnable is used in several places to make decision on load balance:
- PELT runnable_avg
- deciding if a group is overloaded or has spare capacity
- numa stats
- reduced capacity management
- load balance between groups
While fixing h_nr_running, some fields have been renamed to follow the
same pattern. We now have:
- cfs.h_nr_runnable : running tasks in the hierarchy
- cfs.h_nr_queued : enqueued tasks in the hierarchy either running or
delayed dequeue
- cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy
cfs.nr_running has been rename cfs.nr_queued because it includes the
delayed dequeued entities
The unused cfs.idle_nr_running has been removed
Load balance compares the number of running tasks when selecting the
busiest group or runqueue and tries to migrate a runnable task and not a
sleeping delayed dequeue one.
It should be noticed that this serie doesn't fix the problem of delayed
dequeued tasks that can't migrate at wakeup.
Some additional cleanups have been added:
- move variable declaration at the beginning of pick_next_entity()
- sched_can_stop_tick() should use cfs.h_nr_enqueued instead of
cfs.nr_enqueued (previously cfs.nr_running) to know how many tasks
are running in the whole hierarchy instead of how many entities at
root level
Changes since v1:
- reorder the patches
- rename fields into:
- h_nr_queued for all tasks queued both runnable and delayed dequeue
- h_nr_runnable for all runnable tasks
- h_nr_idle for all tasks with sched_idle policy
- Cleanup how h_nr_runnable is updated in enqueue_task_fair() and
dequeue_entities
Peter Zijlstra (1):
sched/eevdf: More PELT vs DELAYED_DEQUEUE
Vincent Guittot (9):
sched/fair: Rename h_nr_running into h_nr_queued
sched/fair: Add new cfs_rq.h_nr_runnable
sched/fair: Removed unsued cfs_rq.h_nr_delayed
sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
sched/fair: Remove unused cfs_rq.idle_nr_running
sched/fair: Rename cfs_rq.nr_running into nr_queued
sched/fair: Do not try to migrate delayed dequeue task
sched/fair: Fix sched_can_stop_tick() for fair tasks
sched/fair: Fix variable declaration position
kernel/sched/core.c | 4 +-
kernel/sched/debug.c | 15 ++-
kernel/sched/fair.c | 236 +++++++++++++++++++++++++------------------
kernel/sched/pelt.c | 4 +-
kernel/sched/sched.h | 12 +--
5 files changed, 152 insertions(+), 119 deletions(-)
--
2.43.0
Powered by blists - more mailing lists