lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66f1e42c-9035-4f9b-8c77-976ab50638bd@amd.com>
Date: Thu, 14 Mar 2024 11:34:00 +0530
From: Swapnil Sapkal <Swapnil.Sapkal@....com>
To: "Gautham R. Shenoy" <gautham.shenoy@....com>,
 Ingo Molnar <mingo@...nel.org>
Cc: Shrikanth Hegde <sshegde@...ux.ibm.com>, linux-kernel@...r.kernel.org,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Valentin Schneider <vschneid@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16



On 3/14/2024 10:51 AM, Gautham R. Shenoy wrote:
> Hello Ingo, Shrikanth,
> 
> On Tue, Mar 12, 2024 at 11:01:54AM +0100, Ingo Molnar wrote:
>>
>> * Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
>>
>>>
>>>
>>> On 3/8/24 4:28 PM, Ingo Molnar wrote:
>>>> We changed the order of definitions within 'enum cpu_idle_type',
>>>> which changed the order of [CPU_MAX_IDLE_TYPES] columns in
>>>> show_schedstat().
>>>>
>>>
>>>> +++ b/kernel/sched/stats.c
>>>> @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
>>>>    * Bump this up when changing the output format or the meaning of an existing
>>>>    * format, so that tools can adapt (or abort)
>>>>    */
>>>> -#define SCHEDSTAT_VERSION 15
>>>> +#define SCHEDSTAT_VERSION 16
>>>
>>> Please add the info about version, and change of the order
>>> briefly in Documentation/scheduler/sched-stats.rst as well.
>>
>> Good point, I've added this paragraph to sched-stats.rst:
>>
>>   +Version 16 of schedstats changed the order of definitions within
>>   +'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>>   +columns in show_schedstat(). In particular the position of CPU_IDLE
>>   +and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
> 
> Thanks for this change!
> 
> Since we are considering bumping up the version for this change, it
> might be good to get rid of "lb_imbalance" which is currently a sum of
> all the different kinds of imbalances (due to load, utilization,
> number of tasks, misfit tasks). This is not useful.
> 
> Swapnil had a patch to print the different imbalances, which will
> change the number of fields. Can we include this change into v16 as
> well?
> 
> Swapnil, could you please rebase your patch on tip/sched/core and post
> it ?
Please find the patch below. This is based on tip/sched/core.

---
 From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
From: Swapnil Sapkal <swapnil.sapkal@....com>
Date: Thu, 15 Jun 2023 04:55:09 +0000
Subject: [PATCH] sched: Report the different kinds of imbalances in /proc/schedstat

In /proc/schedstat, lb_imbalance reports the sum of imbalances
discovered in sched domains with each call to sched_balance_rq(), which is
not very useful because lb_imbalance is an aggregate of the imbalances
due to load, utilization, nr_tasks and misfit_tasks. Remove this field
from /proc/schedstat.

Currently there are no fields in /proc/schedstat to report different types
of imbalances. Introduce new fields in /proc/schedstat to report the
total imbalances in load, utilization, nr_tasks or misfit_tasks.

Added fields to /proc/schedstat:
  	- lb_imbalance_load: Total imbalance due to load.
	- lb_imbalance_util: Total imbalance due to utilization.
	- lb_imbalance_task: Total imbalance due to number of tasks.
	- lb_imbalance_misfit: Total imbalance due to misfit tasks.

Signed-off-by: Swapnil Sapkal <swapnil.sapkal@....com>
---
  Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
  include/linux/sched/topology.h          |   5 +-
  kernel/sched/fair.c                     |  15 +++-
  kernel/sched/stats.c                    |   7 +-
  4 files changed, 80 insertions(+), 51 deletions(-)

diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 7c2b16c4729d..d6e9a8a5619c 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of definitions within
  'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
  columns in show_schedstat(). In particular the position of CPU_IDLE
  and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+Also stop reporting 'lb_imbalance' as it has no significance anymore
+and instead add more relevant fields namely 'lb_imbalance_load',
+'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
  
  Version 15 of schedstats dropped counters for some sched_yield:
  yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
@@ -73,86 +76,93 @@ One of these is produced per domain for each cpu described. (Note that if
  CONFIG_SMP is not defined, *no* domains are utilized and these lines
  will not appear in the output.)
  
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
  
  The first field is a bit mask indicating what cpus this domain operates over.
  
-The next 24 are a variety of sched_balance_rq() statistics in grouped into types
+The next 33 are a variety of sched_balance_rq() statistics in grouped into types
  of idleness (idle, busy, and newly idle):
  
      1)  # of times in this domain sched_balance_rq() was called when the
+        cpu was busy
+    2)  # of times in this domain sched_balance_rq() checked but found the
+        load did not require balancing when busy
+    3)  # of times in this domain sched_balance_rq() tried to move one or
+        more tasks and failed, when the cpu was busy
+    4)  Total imbalance in load when the cpu was busy
+    5)  Total imbalance in utilization when the cpu was busy
+    6)  Total imbalance in number of tasks when the cpu was busy
+    7)  Total imbalance due to misfit tasks when the cpu was busy
+    8)  # of times in this domain pull_task() was called when busy
+    9)  # of times in this domain pull_task() was called even though the
+        target task was cache-hot when busy
+    10) # of times in this domain sched_balance_rq() was called but did not
+        find a busier queue while the cpu was busy
+    11) # of times in this domain a busier queue was found while the cpu
+        was busy but no busier group was found
+
+    12) # of times in this domain sched_balance_rq() was called when the
          cpu was idle
-    2)  # of times in this domain sched_balance_rq() checked but found
+    13) # of times in this domain sched_balance_rq() checked but found
          the load did not require balancing when the cpu was idle
-    3)  # of times in this domain sched_balance_rq() tried to move one or
+    14) # of times in this domain sched_balance_rq() tried to move one or
          more tasks and failed, when the cpu was idle
-    4)  sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was idle
-    5)  # of times in this domain pull_task() was called when the cpu
+    15) Total imbalance in load when the cpu was idle
+    16) Total imbalance in utilization when the cpu was idle
+    17) Total imbalance in number of tasks when the cpu was idle
+    18) Total imbalance due to misfit tasks when the cpu was idle
+    19) # of times in this domain pull_task() was called when the cpu
          was idle
-    6)  # of times in this domain pull_task() was called even though
+    20) # of times in this domain pull_task() was called even though
          the target task was cache-hot when idle
-    7)  # of times in this domain sched_balance_rq() was called but did
+    21) # of times in this domain sched_balance_rq() was called but did
          not find a busier queue while the cpu was idle
-    8)  # of times in this domain a busier queue was found while the
+    22) # of times in this domain a busier queue was found while the
          cpu was idle but no busier group was found
-    9)  # of times in this domain sched_balance_rq() was called when the
-        cpu was busy
-    10) # of times in this domain sched_balance_rq() checked but found the
-        load did not require balancing when busy
-    11) # of times in this domain sched_balance_rq() tried to move one or
-        more tasks and failed, when the cpu was busy
-    12) sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was busy
-    13) # of times in this domain pull_task() was called when busy
-    14) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when busy
-    15) # of times in this domain sched_balance_rq() was called but did not
-        find a busier queue while the cpu was busy
-    16) # of times in this domain a busier queue was found while the cpu
-        was busy but no busier group was found
  
-    17) # of times in this domain sched_balance_rq() was called when the
-        cpu was just becoming idle
-    18) # of times in this domain sched_balance_rq() checked but found the
+    23) # of times in this domain sched_balance_rq() was called when the
+        was just becoming idle
+    24) # of times in this domain sched_balance_rq() checked but found the
          load did not require balancing when the cpu was just becoming idle
-    19) # of times in this domain sched_balance_rq() tried to move one or more
+    25) # of times in this domain sched_balance_rq() tried to move one or more
          tasks and failed, when the cpu was just becoming idle
-    20) sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was just becoming idle
-    21) # of times in this domain pull_task() was called when newly idle
-    22) # of times in this domain pull_task() was called even though the
+    26) Total imbalance in load when the cpu was just becoming idle
+    27) Total imbalance in utilization when the cpu was just becoming idle
+    28) Total imbalance in number of tasks when the cpu was just becoming idle
+    29) Total imbalance due to misfit tasks when the cpu was just becoming idle
+    30) # of times in this domain pull_task() was called when newly idle
+    31) # of times in this domain pull_task() was called even though the
          target task was cache-hot when just becoming idle
-    23) # of times in this domain sched_balance_rq() was called but did not
+    32) # of times in this domain sched_balance_rq() was called but did not
          find a busier queue while the cpu was just becoming idle
-    24) # of times in this domain a busier queue was found while the cpu
+    33) # of times in this domain a busier queue was found while the cpu
          was just becoming idle but no busier group was found
  
     Next three are active_load_balance() statistics:
  
-    25) # of times active_load_balance() was called
-    26) # of times active_load_balance() tried to move a task and failed
-    27) # of times active_load_balance() successfully moved a task
+    34) # of times active_load_balance() was called
+    35) # of times active_load_balance() tried to move a task and failed
+    36) # of times active_load_balance() successfully moved a task
  
     Next three are sched_balance_exec() statistics:
  
-    28) sbe_cnt is not used
-    29) sbe_balanced is not used
-    30) sbe_pushed is not used
+    37) sbe_cnt is not used
+    38) sbe_balanced is not used
+    39) sbe_pushed is not used
  
     Next three are sched_balance_fork() statistics:
  
-    31) sbf_cnt is not used
-    32) sbf_balanced is not used
-    33) sbf_pushed is not used
+    40) sbf_cnt is not used
+    41) sbf_balanced is not used
+    42) sbf_pushed is not used
  
     Next three are try_to_wake_up() statistics:
  
-    34) # of times in this domain try_to_wake_up() awoke a task that
+    43) # of times in this domain try_to_wake_up() awoke a task that
          last ran on a different cpu in this domain
-    35) # of times in this domain try_to_wake_up() moved a task to the
+    44) # of times in this domain try_to_wake_up() moved a task to the
          waking cpu because it was cache-cold on its own cpu anyway
-    36) # of times in this domain try_to_wake_up() started passive balancing
+    45) # of times in this domain try_to_wake_up() started passive balancing
  
  /proc/<pid>/schedstat
  ---------------------
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index c8fe9bab981b..15685c40a713 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -114,7 +114,10 @@ struct sched_domain {
  	unsigned int lb_count[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
-	unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a19ea290b790..515258f97ba3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
  
  	WARN_ON_ONCE(busiest == env.dst_rq);
  
-	schedstat_add(sd->lb_imbalance[idle], env.imbalance);
+	switch (env.migration_type) {
+	case migrate_load:
+		schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
+		break;
+	case migrate_util:
+		schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
+		break;
+	case migrate_task:
+		schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
+		break;
+	case migrate_misfit:
+		schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
+		break;
+	}
  
  	env.src_cpu = busiest->cpu;
  	env.src_rq = busiest;
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 78e48f5426ee..a02bc9db2f1c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq, void *v)
  			seq_printf(seq, "domain%d %*pb", dcount++,
  				   cpumask_pr_args(sched_domain_span(sd)));
  			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
-				seq_printf(seq, " %u %u %u %u %u %u %u %u",
+				seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
  				    sd->lb_count[itype],
  				    sd->lb_balanced[itype],
  				    sd->lb_failed[itype],
-				    sd->lb_imbalance[itype],
+				    sd->lb_imbalance_load[itype],
+				    sd->lb_imbalance_util[itype],
+				    sd->lb_imbalance_task[itype],
+				    sd->lb_imbalance_misfit[itype],
  				    sd->lb_gained[itype],
  				    sd->lb_hot_gained[itype],
  				    sd->lb_nobusyq[itype],
-- 
2.34.1


> 
>>
>>> One recent user that I recollect is sched scoreboard. Version number should
>>> be able to help the users when they it is breaking their scripts.
>>>
> 
>>> +Gautham, Any thoughts?
> 
> Yes, the scoreboard looks at the version. If it doesn't understand a
> version, it will not parse it.  It can be extended to understand the
> new version, since that's only about adding a new json.
> 
>>
>> If it's really a problem, I suspect we could maintain the v15 order of
>> output.
> 
> I don't think moving to v16 is an issue.
> 
>>
>> Thanks,
>>
>> 	Ingo
> 
> --
> Thanks and Regards
> gautham.

--
Thanks and Regards,
Swapnil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ