lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240307101945.11280-1-CruzZhao@linux.alibaba.com>
Date: Thu,  7 Mar 2024 18:19:42 +0800
From: Cruz Zhao <CruzZhao@...ux.alibaba.com>
To: tj@...nel.org,
	lizefan.x@...edance.com,
	hannes@...xchg.org,
	mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	bristot@...hat.com,
	vschneid@...hat.com
Cc: cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH v2 0/3] introduce CPUTIME_FORCEIDLE_TASK and add

As core sched uses rq_clock() as clock source to account forceidle
time, irq time will be accounted into forceidle time. However, in
some scenarios, forceidle sum will be much larger than exec runtime,
e.g., we observed that forceidle time of task calling futex_wake()
is 50% larger than exec runtime, which is confusing.

In our test case, task 26281 is the task with this problem, and we bound
it to cpu0, and it's SMT sibling is running stress-ng -c 1. Then we sample
forceidle time and runtime of task 26281, and stat of cpu0:

  [root@...alhost 26281]# cat ./sched |grep -E
  "forceidle|sum_exec_runtime" && cat /proc/stat |grep cpu0 && echo "" &&
  sleep 10 && cat ./sched |grep -E "forceidle|sum_exec_runtime" && cat
  /proc/stat |grep cpu0
  se.sum_exec_runtime                          :          3353.788406
  core_forceidle_sum                           :          4522.497675
  core_forceidle_task_sum                      :          3354.383413
  cpu0 1368 74 190 87023149 1 2463 3308 0 0 0
  
  se.sum_exec_runtime                          :          3952.897106
  core_forceidle_sum                           :          5311.687917
  core_forceidle_task_sum                      :          3953.571613
  cpu0 1368 74 190 87024043 1 2482 3308 0 0 0

As we can see from the data, se.sum_exec_runtime increased by 600ms,
core_forceidle_sum(using rq_clock) increased by 790ms,
and core_forceidle_task_sum(using rq_clock_task, which subtracts irq
time) increased by 600ms, closing to sum_exec_runtime.

As for the irq time from /proc/stat, irq time increased by 19 ticks,
190ms, closing to the difference of increment of core_forceidle_sum and
se.sum_exec_runtime.

We introduce cpustat[CPUTIME_FORCEIDLE_TASK] to account the time
that a task is actually running while the SMT siblings are forced
idle, using rq_clock_task() as clock source.

     |<---------------------forceidle time--------------------->|
     |<--forceidle task time-->|      |<--forceidle task time-->|
     |<------exec runtime----->|      |<-----exec runtime------>|
ht0  |          A running      | irq  |         A running       |

ht1  |                          idle                            |
     |                        B queuing                         |

Interfaces:
 - task level: /proc/$pid/sched, row core_forceidle_task_sum.
 - cgroup level: /sys/fs/cgroup/$cg/cpu.stat, row
     core_sched.force_idle_task_usec.

This pathset also add description of forceidle time and forceidle_task
time in Documentation.

v1--->v2: add description of forceidle time and forceidle_task time in
Documentation.

Cruz Zhao (3):
  Documentation: add description of forceidle time statistics
  sched/core: introduce CPUTIME_FORCEIDLE_TASK
  Documentation: add description of forceidle_task time statistics

 Documentation/admin-guide/cgroup-v2.rst       |  4 ++-
 .../admin-guide/hw-vuln/core-scheduling.rst   | 30 +++++++++++++++++++
 include/linux/cgroup-defs.h                   |  1 +
 include/linux/kernel_stat.h                   |  3 +-
 include/linux/sched.h                         |  1 +
 kernel/cgroup/rstat.c                         | 11 +++++++
 kernel/sched/core.c                           |  5 ++++
 kernel/sched/core_sched.c                     |  5 +++-
 kernel/sched/cputime.c                        |  4 ++-
 kernel/sched/debug.c                          |  1 +
 kernel/sched/sched.h                          |  1 +
 11 files changed, 62 insertions(+), 4 deletions(-)

-- 
2.39.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ