[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66f1e42c-9035-4f9b-8c77-976ab50638bd@amd.com>
Date: Thu, 14 Mar 2024 11:34:00 +0530
From: Swapnil Sapkal <Swapnil.Sapkal@....com>
To: "Gautham R. Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...nel.org>
Cc: Shrikanth Hegde <sshegde@...ux.ibm.com>, linux-kernel@...r.kernel.org,
Dietmar Eggemann <dietmar.eggemann@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
On 3/14/2024 10:51 AM, Gautham R. Shenoy wrote:
> Hello Ingo, Shrikanth,
>
> On Tue, Mar 12, 2024 at 11:01:54AM +0100, Ingo Molnar wrote:
>>
>> * Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
>>
>>>
>>>
>>> On 3/8/24 4:28 PM, Ingo Molnar wrote:
>>>> We changed the order of definitions within 'enum cpu_idle_type',
>>>> which changed the order of [CPU_MAX_IDLE_TYPES] columns in
>>>> show_schedstat().
>>>>
>>>
>>>> +++ b/kernel/sched/stats.c
>>>> @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
>>>> * Bump this up when changing the output format or the meaning of an existing
>>>> * format, so that tools can adapt (or abort)
>>>> */
>>>> -#define SCHEDSTAT_VERSION 15
>>>> +#define SCHEDSTAT_VERSION 16
>>>
>>> Please add the info about version, and change of the order
>>> briefly in Documentation/scheduler/sched-stats.rst as well.
>>
>> Good point, I've added this paragraph to sched-stats.rst:
>>
>> +Version 16 of schedstats changed the order of definitions within
>> +'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>> +columns in show_schedstat(). In particular the position of CPU_IDLE
>> +and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
>
> Thanks for this change!
>
> Since we are considering bumping up the version for this change, it
> might be good to get rid of "lb_imbalance" which is currently a sum of
> all the different kinds of imbalances (due to load, utilization,
> number of tasks, misfit tasks). This is not useful.
>
> Swapnil had a patch to print the different imbalances, which will
> change the number of fields. Can we include this change into v16 as
> well?
>
> Swapnil, could you please rebase your patch on tip/sched/core and post
> it ?
Please find the patch below. This is based on tip/sched/core.
---
From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
From: Swapnil Sapkal <swapnil.sapkal@....com>
Date: Thu, 15 Jun 2023 04:55:09 +0000
Subject: [PATCH] sched: Report the different kinds of imbalances in /proc/schedstat
In /proc/schedstat, lb_imbalance reports the sum of imbalances
discovered in sched domains with each call to sched_balance_rq(), which is
not very useful because lb_imbalance is an aggregate of the imbalances
due to load, utilization, nr_tasks and misfit_tasks. Remove this field
from /proc/schedstat.
Currently there are no fields in /proc/schedstat to report different types
of imbalances. Introduce new fields in /proc/schedstat to report the
total imbalances in load, utilization, nr_tasks or misfit_tasks.
Added fields to /proc/schedstat:
- lb_imbalance_load: Total imbalance due to load.
- lb_imbalance_util: Total imbalance due to utilization.
- lb_imbalance_task: Total imbalance due to number of tasks.
- lb_imbalance_misfit: Total imbalance due to misfit tasks.
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@....com>
---
Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
include/linux/sched/topology.h | 5 +-
kernel/sched/fair.c | 15 +++-
kernel/sched/stats.c | 7 +-
4 files changed, 80 insertions(+), 51 deletions(-)
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 7c2b16c4729d..d6e9a8a5619c 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of definitions within
'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
columns in show_schedstat(). In particular the position of CPU_IDLE
and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+Also stop reporting 'lb_imbalance' as it has no significance anymore
+and instead add more relevant fields namely 'lb_imbalance_load',
+'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
Version 15 of schedstats dropped counters for some sched_yield:
yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
@@ -73,86 +76,93 @@ One of these is produced per domain for each cpu described. (Note that if
CONFIG_SMP is not defined, *no* domains are utilized and these lines
will not appear in the output.)
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
The first field is a bit mask indicating what cpus this domain operates over.
-The next 24 are a variety of sched_balance_rq() statistics in grouped into types
+The next 33 are a variety of sched_balance_rq() statistics in grouped into types
of idleness (idle, busy, and newly idle):
1) # of times in this domain sched_balance_rq() was called when the
+ cpu was busy
+ 2) # of times in this domain sched_balance_rq() checked but found the
+ load did not require balancing when busy
+ 3) # of times in this domain sched_balance_rq() tried to move one or
+ more tasks and failed, when the cpu was busy
+ 4) Total imbalance in load when the cpu was busy
+ 5) Total imbalance in utilization when the cpu was busy
+ 6) Total imbalance in number of tasks when the cpu was busy
+ 7) Total imbalance due to misfit tasks when the cpu was busy
+ 8) # of times in this domain pull_task() was called when busy
+ 9) # of times in this domain pull_task() was called even though the
+ target task was cache-hot when busy
+ 10) # of times in this domain sched_balance_rq() was called but did not
+ find a busier queue while the cpu was busy
+ 11) # of times in this domain a busier queue was found while the cpu
+ was busy but no busier group was found
+
+ 12) # of times in this domain sched_balance_rq() was called when the
cpu was idle
- 2) # of times in this domain sched_balance_rq() checked but found
+ 13) # of times in this domain sched_balance_rq() checked but found
the load did not require balancing when the cpu was idle
- 3) # of times in this domain sched_balance_rq() tried to move one or
+ 14) # of times in this domain sched_balance_rq() tried to move one or
more tasks and failed, when the cpu was idle
- 4) sum of imbalances discovered (if any) with each call to
- sched_balance_rq() in this domain when the cpu was idle
- 5) # of times in this domain pull_task() was called when the cpu
+ 15) Total imbalance in load when the cpu was idle
+ 16) Total imbalance in utilization when the cpu was idle
+ 17) Total imbalance in number of tasks when the cpu was idle
+ 18) Total imbalance due to misfit tasks when the cpu was idle
+ 19) # of times in this domain pull_task() was called when the cpu
was idle
- 6) # of times in this domain pull_task() was called even though
+ 20) # of times in this domain pull_task() was called even though
the target task was cache-hot when idle
- 7) # of times in this domain sched_balance_rq() was called but did
+ 21) # of times in this domain sched_balance_rq() was called but did
not find a busier queue while the cpu was idle
- 8) # of times in this domain a busier queue was found while the
+ 22) # of times in this domain a busier queue was found while the
cpu was idle but no busier group was found
- 9) # of times in this domain sched_balance_rq() was called when the
- cpu was busy
- 10) # of times in this domain sched_balance_rq() checked but found the
- load did not require balancing when busy
- 11) # of times in this domain sched_balance_rq() tried to move one or
- more tasks and failed, when the cpu was busy
- 12) sum of imbalances discovered (if any) with each call to
- sched_balance_rq() in this domain when the cpu was busy
- 13) # of times in this domain pull_task() was called when busy
- 14) # of times in this domain pull_task() was called even though the
- target task was cache-hot when busy
- 15) # of times in this domain sched_balance_rq() was called but did not
- find a busier queue while the cpu was busy
- 16) # of times in this domain a busier queue was found while the cpu
- was busy but no busier group was found
- 17) # of times in this domain sched_balance_rq() was called when the
- cpu was just becoming idle
- 18) # of times in this domain sched_balance_rq() checked but found the
+ 23) # of times in this domain sched_balance_rq() was called when the
+ was just becoming idle
+ 24) # of times in this domain sched_balance_rq() checked but found the
load did not require balancing when the cpu was just becoming idle
- 19) # of times in this domain sched_balance_rq() tried to move one or more
+ 25) # of times in this domain sched_balance_rq() tried to move one or more
tasks and failed, when the cpu was just becoming idle
- 20) sum of imbalances discovered (if any) with each call to
- sched_balance_rq() in this domain when the cpu was just becoming idle
- 21) # of times in this domain pull_task() was called when newly idle
- 22) # of times in this domain pull_task() was called even though the
+ 26) Total imbalance in load when the cpu was just becoming idle
+ 27) Total imbalance in utilization when the cpu was just becoming idle
+ 28) Total imbalance in number of tasks when the cpu was just becoming idle
+ 29) Total imbalance due to misfit tasks when the cpu was just becoming idle
+ 30) # of times in this domain pull_task() was called when newly idle
+ 31) # of times in this domain pull_task() was called even though the
target task was cache-hot when just becoming idle
- 23) # of times in this domain sched_balance_rq() was called but did not
+ 32) # of times in this domain sched_balance_rq() was called but did not
find a busier queue while the cpu was just becoming idle
- 24) # of times in this domain a busier queue was found while the cpu
+ 33) # of times in this domain a busier queue was found while the cpu
was just becoming idle but no busier group was found
Next three are active_load_balance() statistics:
- 25) # of times active_load_balance() was called
- 26) # of times active_load_balance() tried to move a task and failed
- 27) # of times active_load_balance() successfully moved a task
+ 34) # of times active_load_balance() was called
+ 35) # of times active_load_balance() tried to move a task and failed
+ 36) # of times active_load_balance() successfully moved a task
Next three are sched_balance_exec() statistics:
- 28) sbe_cnt is not used
- 29) sbe_balanced is not used
- 30) sbe_pushed is not used
+ 37) sbe_cnt is not used
+ 38) sbe_balanced is not used
+ 39) sbe_pushed is not used
Next three are sched_balance_fork() statistics:
- 31) sbf_cnt is not used
- 32) sbf_balanced is not used
- 33) sbf_pushed is not used
+ 40) sbf_cnt is not used
+ 41) sbf_balanced is not used
+ 42) sbf_pushed is not used
Next three are try_to_wake_up() statistics:
- 34) # of times in this domain try_to_wake_up() awoke a task that
+ 43) # of times in this domain try_to_wake_up() awoke a task that
last ran on a different cpu in this domain
- 35) # of times in this domain try_to_wake_up() moved a task to the
+ 44) # of times in this domain try_to_wake_up() moved a task to the
waking cpu because it was cache-cold on its own cpu anyway
- 36) # of times in this domain try_to_wake_up() started passive balancing
+ 45) # of times in this domain try_to_wake_up() started passive balancing
/proc/<pid>/schedstat
---------------------
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index c8fe9bab981b..15685c40a713 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -114,7 +114,10 @@ struct sched_domain {
unsigned int lb_count[CPU_MAX_IDLE_TYPES];
unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
- unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
+ unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
+ unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
+ unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
+ unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a19ea290b790..515258f97ba3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
WARN_ON_ONCE(busiest == env.dst_rq);
- schedstat_add(sd->lb_imbalance[idle], env.imbalance);
+ switch (env.migration_type) {
+ case migrate_load:
+ schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
+ break;
+ case migrate_util:
+ schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
+ break;
+ case migrate_task:
+ schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
+ break;
+ case migrate_misfit:
+ schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
+ break;
+ }
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 78e48f5426ee..a02bc9db2f1c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq, void *v)
seq_printf(seq, "domain%d %*pb", dcount++,
cpumask_pr_args(sched_domain_span(sd)));
for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
- seq_printf(seq, " %u %u %u %u %u %u %u %u",
+ seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
sd->lb_count[itype],
sd->lb_balanced[itype],
sd->lb_failed[itype],
- sd->lb_imbalance[itype],
+ sd->lb_imbalance_load[itype],
+ sd->lb_imbalance_util[itype],
+ sd->lb_imbalance_task[itype],
+ sd->lb_imbalance_misfit[itype],
sd->lb_gained[itype],
sd->lb_hot_gained[itype],
sd->lb_nobusyq[itype],
--
2.34.1
>
>>
>>> One recent user that I recollect is sched scoreboard. Version number should
>>> be able to help the users when they it is breaking their scripts.
>>>
>
>>> +Gautham, Any thoughts?
>
> Yes, the scoreboard looks at the version. If it doesn't understand a
> version, it will not parse it. It can be extended to understand the
> new version, since that's only about adding a new json.
>
>>
>> If it's really a problem, I suspect we could maintain the v15 order of
>> output.
>
> I don't think moving to v16 is an issue.
>
>>
>> Thanks,
>>
>> Ingo
>
> --
> Thanks and Regards
> gautham.
--
Thanks and Regards,
Swapnil
Powered by blists - more mailing lists