lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240528003521.979836-28-ankur.a.arora@oracle.com>
Date: Mon, 27 May 2024 17:35:13 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
        paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
        juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
        sshegde@...ux.ibm.com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com, Ankur Arora <ankur.a.arora@...cle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: [PATCH v2 27/35] sched: support preempt=none under PREEMPT_AUTO

The default preemption policy for the no forced preemption model under
PREEMPT_AUTO is to always schedule lazily for well-behaved, non-idle
tasks, preempting at exit-to-user.

We already have that, so enable it.

Comparing a scheduling/IPC workload:

 # perf stat -a -e cs --repeat 10 --  perf bench sched messaging -g 20 -t -l 5000

 PREEMPT_AUTO, preempt=none

        3,173,961             context-switches      ( +-  0.60% )

           3.03058 +- 0.00621 seconds time elapsed  ( +- 0.20% )

 PREEMPT_DYNAMIC, preempt=none

         2,942,664      context-switches            ( +-  0.49% )

           3.18924 +- 0.00483 seconds time elapsed  ( +-  0.15% )

Both perform similarly, but we incur a slightly higher number of
context-switches with PREEMPT_AUTO.

Drilling down we see that both voluntary and involuntary
context-switches are higher for this test:

 PREEMPT_AUTO, preempt=none

          2286219.90 +- 39510.80 voluntary context-switches   ( +- 1.72% )
	   887741.80 +- 20137.63 involuntary context-switches ( +- 2.26% )

 PREEMPT_DYNAMIC, preempt=none

          2125750.40 +- 29593.55 voluntary context-switches   ( +- 1.39% )
           816914.20 +- 13723.46 involuntary context-switches ( +- 1.67% )

Assuming voluntary context-switches due to explicit blocking are
similar, we expect that PREEMPT_AUTO will incur larger context
switches at exit-to-user (counted as voluntary) since that is its
default rescheduling point.

Involuntary context-switches, under PREEMPT_AUTO are seen when a
task has exceeded its time quanta by a tick. Under PREEMPT_DYNAMIC,
these are incurred when a task needs to be rescheduled and then
encounters a cond_resched().
So, these two numbers aren't directly comparable.

Comparing a kernbench workload:

  # Half load (-j 32)

                         PREEMPT_AUTO                           PREEMPT_DYNAMIC

  wall            74.41 +-     0.45 ( +-  0.60% )         74.20 +-    0.33 sec ( +- 0.45% )
  utime         1419.78 +-     2.04 ( +-  0.14% )       1416.40 +-    6.07 sec ( +- 0.42% )
  stime          247.70 +-     0.88 ( +-  0.35% )        246.23 +-    1.20 sec ( +- 0.49% )
  %cpu          2240.20 +-    16.03 ( +-  0.71% )       2240.20 +-   19.34     ( +- 0.86% )
  inv-csw      13056.00 +-   427.58 ( +-  3.27% )      18750.60 +-  771.21     ( +- 4.11% )
  vol-csw     191000.00 +-  1623.25 ( +-  0.84% )     182857.00 +- 2373.12     ( +- 1.29% )

The runtimes are basically identical for both of these. Voluntary
context switches, as above (and in the optimal, maximal runs below),
are higher. Which as mentioned above, does add up.

However, unlike the sched-messaging workload, the involuntary
context-switches are generally lower (also true for the optimal,
maximal runs.) One reason for that might be that kbuild spends
~20% time executing in the kernel, while sched-messaging spends ~95%
time in the kernel. Which means a greater likelihood of being
preempted due to exceeding its time quanta.

Cc: Ingo Molnar <mingo@...hat.com>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Peter Ziljstra <peterz@...radead.org>
Originally-by: Thomas Gleixner <tglx@...utronix.de>
Link: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2bc7f636267d..c3ba33c77053 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8983,7 +8983,9 @@ static void __sched_dynamic_update(int mode)
 {
 	switch (mode) {
 	case preempt_dynamic_none:
-		preempt_dynamic_mode = preempt_dynamic_undefined;
+		if (mode != preempt_dynamic_mode)
+			pr_info("%s: none\n", PREEMPT_MODE);
+		preempt_dynamic_mode = mode;
 		break;
 
 	case preempt_dynamic_voluntary:
-- 
2.31.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ