[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5a53c05-6ff3-4bb3-96e0-b8865bd81611@linux.ibm.com>
Date: Fri, 15 Mar 2024 10:07:22 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Swapnil Sapkal <Swapnil.Sapkal@....com>
Cc: linux-kernel@...r.kernel.org, Dietmar Eggemann
<dietmar.eggemann@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
On 3/14/24 11:34 AM, Swapnil Sapkal wrote:
>
>
> Please find the patch below. This is based on tip/sched/core.
>
> ---
> From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
> From: Swapnil Sapkal <swapnil.sapkal@....com>
> Date: Thu, 15 Jun 2023 04:55:09 +0000
> Subject: [PATCH] sched: Report the different kinds of imbalances in
> /proc/schedstat
>
> In /proc/schedstat, lb_imbalance reports the sum of imbalances
> discovered in sched domains with each call to sched_balance_rq(), which is
> not very useful because lb_imbalance is an aggregate of the imbalances
> due to load, utilization, nr_tasks and misfit_tasks. Remove this field
> from /proc/schedstat.
>
Yes. This makes sense. any one of them can skew the value.
> Currently there are no fields in /proc/schedstat to report different types
> of imbalances. Introduce new fields in /proc/schedstat to report the
> total imbalances in load, utilization, nr_tasks or misfit_tasks.
>
> Added fields to /proc/schedstat:
> - lb_imbalance_load: Total imbalance due to load.
> - lb_imbalance_util: Total imbalance due to utilization.
> - lb_imbalance_task: Total imbalance due to number of tasks.
> - lb_imbalance_misfit: Total imbalance due to misfit tasks.
>
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@....com>
> ---
> Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
> include/linux/sched/topology.h | 5 +-
> kernel/sched/fair.c | 15 +++-
> kernel/sched/stats.c | 7 +-
> 4 files changed, 80 insertions(+), 51 deletions(-)
>
> diff --git a/Documentation/scheduler/sched-stats.rst
> b/Documentation/scheduler/sched-stats.rst
> index 7c2b16c4729d..d6e9a8a5619c 100644
> --- a/Documentation/scheduler/sched-stats.rst
> +++ b/Documentation/scheduler/sched-stats.rst
> @@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of
> definitions within
> 'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
> columns in show_schedstat(). In particular the position of CPU_IDLE
> and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
> +Also stop reporting 'lb_imbalance' as it has no significance anymore
> +and instead add more relevant fields namely 'lb_imbalance_load',
> +'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
>
> Version 15 of schedstats dropped counters for some sched_yield:
> yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
> @@ -73,86 +76,93 @@ One of these is produced per domain for each cpu
> described. (Note that if
> CONFIG_SMP is not defined, *no* domains are utilized and these lines
> will not appear in the output.)
>
> -domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
> +domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
>
> The first field is a bit mask indicating what cpus this domain operates
> over.
>
IIUC, this is editing the content of Version 15, But changes would happen on Version 16.
Instead can we add the section for Version 16 and not modify for 15?
> -The next 24 are a variety of sched_balance_rq() statistics in grouped
> into types
> +The next 33 are a variety of sched_balance_rq() statistics in grouped
> into types
> of idleness (idle, busy, and newly idle):
>
> 1) # of times in this domain sched_balance_rq() was called when the
> + cpu was busy
> + 2) # of times in this domain sched_balance_rq() checked but found the
> + load did not require balancing when busy
> + 3) # of times in this domain sched_balance_rq() tried to move one or
> + more tasks and failed, when the cpu was busy
> + 4) Total imbalance in load when the cpu was busy
> + 5) Total imbalance in utilization when the cpu was busy
> + 6) Total imbalance in number of tasks when the cpu was busy
> + 7) Total imbalance due to misfit tasks when the cpu was busy
> + 8) # of times in this domain pull_task() was called when busy
> + 9) # of times in this domain pull_task() was called even though the
> + target task was cache-hot when busy
> + 10) # of times in this domain sched_balance_rq() was called but did
> not
> + find a busier queue while the cpu was busy
> + 11) # of times in this domain a busier queue was found while the cpu
> + was busy but no busier group was found
> +
> + 12) # of times in this domain sched_balance_rq() was called when the
> cpu was idle
> - 2) # of times in this domain sched_balance_rq() checked but found
> + 13) # of times in this domain sched_balance_rq() checked but found
> the load did not require balancing when the cpu was idle
> - 3) # of times in this domain sched_balance_rq() tried to move one or
> + 14) # of times in this domain sched_balance_rq() tried to move one or
> more tasks and failed, when the cpu was idle
> - 4) sum of imbalances discovered (if any) with each call to
> - sched_balance_rq() in this domain when the cpu was idle
> - 5) # of times in this domain pull_task() was called when the cpu
> + 15) Total imbalance in load when the cpu was idle
> + 16) Total imbalance in utilization when the cpu was idle
> + 17) Total imbalance in number of tasks when the cpu was idle
> + 18) Total imbalance due to misfit tasks when the cpu was idle
> + 19) # of times in this domain pull_task() was called when the cpu
> was idle
> - 6) # of times in this domain pull_task() was called even though
> + 20) # of times in this domain pull_task() was called even though
> the target task was cache-hot when idle
> - 7) # of times in this domain sched_balance_rq() was called but did
> + 21) # of times in this domain sched_balance_rq() was called but did
> not find a busier queue while the cpu was idle
> - 8) # of times in this domain a busier queue was found while the
> + 22) # of times in this domain a busier queue was found while the
> cpu was idle but no busier group was found
> - 9) # of times in this domain sched_balance_rq() was called when the
> - cpu was busy
> - 10) # of times in this domain sched_balance_rq() checked but found the
> - load did not require balancing when busy
> - 11) # of times in this domain sched_balance_rq() tried to move one or
> - more tasks and failed, when the cpu was busy
> - 12) sum of imbalances discovered (if any) with each call to
> - sched_balance_rq() in this domain when the cpu was busy
> - 13) # of times in this domain pull_task() was called when busy
> - 14) # of times in this domain pull_task() was called even though the
> - target task was cache-hot when busy
> - 15) # of times in this domain sched_balance_rq() was called but did
> not
> - find a busier queue while the cpu was busy
> - 16) # of times in this domain a busier queue was found while the cpu
> - was busy but no busier group was found
>
> - 17) # of times in this domain sched_balance_rq() was called when the
> - cpu was just becoming idle
> - 18) # of times in this domain sched_balance_rq() checked but found the
> + 23) # of times in this domain sched_balance_rq() was called when the
> + was just becoming idle
> + 24) # of times in this domain sched_balance_rq() checked but found the
> load did not require balancing when the cpu was just becoming idle
> - 19) # of times in this domain sched_balance_rq() tried to move one
> or more
> + 25) # of times in this domain sched_balance_rq() tried to move one
> or more
> tasks and failed, when the cpu was just becoming idle
> - 20) sum of imbalances discovered (if any) with each call to
> - sched_balance_rq() in this domain when the cpu was just
> becoming idle
> - 21) # of times in this domain pull_task() was called when newly idle
> - 22) # of times in this domain pull_task() was called even though the
> + 26) Total imbalance in load when the cpu was just becoming idle
> + 27) Total imbalance in utilization when the cpu was just becoming idle
> + 28) Total imbalance in number of tasks when the cpu was just
> becoming idle
> + 29) Total imbalance due to misfit tasks when the cpu was just
> becoming idle
> + 30) # of times in this domain pull_task() was called when newly idle
> + 31) # of times in this domain pull_task() was called even though the
> target task was cache-hot when just becoming idle
> - 23) # of times in this domain sched_balance_rq() was called but did
> not
> + 32) # of times in this domain sched_balance_rq() was called but did
> not
> find a busier queue while the cpu was just becoming idle
> - 24) # of times in this domain a busier queue was found while the cpu
> + 33) # of times in this domain a busier queue was found while the cpu
> was just becoming idle but no busier group was found
>
> Next three are active_load_balance() statistics:
>
> - 25) # of times active_load_balance() was called
> - 26) # of times active_load_balance() tried to move a task and failed
> - 27) # of times active_load_balance() successfully moved a task
> + 34) # of times active_load_balance() was called
> + 35) # of times active_load_balance() tried to move a task and failed
> + 36) # of times active_load_balance() successfully moved a task
>
> Next three are sched_balance_exec() statistics:
>
> - 28) sbe_cnt is not used
> - 29) sbe_balanced is not used
> - 30) sbe_pushed is not used
> + 37) sbe_cnt is not used
> + 38) sbe_balanced is not used
> + 39) sbe_pushed is not used
>
> Next three are sched_balance_fork() statistics:
>
> - 31) sbf_cnt is not used
> - 32) sbf_balanced is not used
> - 33) sbf_pushed is not used
> + 40) sbf_cnt is not used
> + 41) sbf_balanced is not used
> + 42) sbf_pushed is not used
>
> Next three are try_to_wake_up() statistics:
>
> - 34) # of times in this domain try_to_wake_up() awoke a task that
> + 43) # of times in this domain try_to_wake_up() awoke a task that
> last ran on a different cpu in this domain
> - 35) # of times in this domain try_to_wake_up() moved a task to the
> + 44) # of times in this domain try_to_wake_up() moved a task to the
> waking cpu because it was cache-cold on its own cpu anyway
> - 36) # of times in this domain try_to_wake_up() started passive
> balancing
> + 45) # of times in this domain try_to_wake_up() started passive
> balancing
>
> /proc/<pid>/schedstat
> ---------------------
> diff --git a/include/linux/sched/topology.h
> b/include/linux/sched/topology.h
> index c8fe9bab981b..15685c40a713 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -114,7 +114,10 @@ struct sched_domain {
> unsigned int lb_count[CPU_MAX_IDLE_TYPES];
> unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
> unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
> - unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
> + unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
> + unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
> + unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
> + unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
> unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
> unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
> unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a19ea290b790..515258f97ba3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu,
> struct rq *this_rq,
>
> WARN_ON_ONCE(busiest == env.dst_rq);
>
> - schedstat_add(sd->lb_imbalance[idle], env.imbalance);
> + switch (env.migration_type) {
> + case migrate_load:
> + schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
> + break;
> + case migrate_util:
> + schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
> + break;
> + case migrate_task:
> + schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
> + break;
> + case migrate_misfit:
> + schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
> + break;
> + }
>
This switch statement could use a helper function.
> env.src_cpu = busiest->cpu;
> env.src_rq = busiest;
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index 78e48f5426ee..a02bc9db2f1c 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq,
> void *v)
> seq_printf(seq, "domain%d %*pb", dcount++,
> cpumask_pr_args(sched_domain_span(sd)));
> for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
> - seq_printf(seq, " %u %u %u %u %u %u %u %u",
> + seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
> sd->lb_count[itype],
> sd->lb_balanced[itype],
> sd->lb_failed[itype],
> - sd->lb_imbalance[itype],
> + sd->lb_imbalance_load[itype],
> + sd->lb_imbalance_util[itype],
> + sd->lb_imbalance_task[itype],
> + sd->lb_imbalance_misfit[itype],
> sd->lb_gained[itype],
> sd->lb_hot_gained[itype],
> sd->lb_nobusyq[itype],
Powered by blists - more mailing lists