lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <170973404551.398.9744367892734721294.tip-bot2@tip-bot2>
Date: Wed, 06 Mar 2024 14:07:25 -0000
From: "tip-bot2 for Frederic Weisbecker" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Frederic Weisbecker <frederic@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject:
 [tip: timers/core] timer/migration: Fix quick check reporting late expiry

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     8ca1836769d758e4fbf5851bb81e181c52193f5d
Gitweb:        https://git.kernel.org/tip/8ca1836769d758e4fbf5851bb81e181c52193f5d
Author:        Frederic Weisbecker <frederic@...nel.org>
AuthorDate:    Tue, 05 Mar 2024 01:28:22 +01:00
Committer:     Thomas Gleixner <tglx@...utronix.de>
CommitterDate: Wed, 06 Mar 2024 15:02:09 +01:00

timer/migration: Fix quick check reporting late expiry

When a CPU is the last active in the hierarchy and it tries to enter
into idle, the quick check looking up the next event towards cpuidle
heuristics may report a too late expiry, such as in the following
scenario:

                        [GRP1:0]
                     migrator = NONE
                     active   = NONE
                     nextevt  = T0:0, T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = NONE           migrator = NONE
       active   = NONE           active   = NONE
       nextevt  = T0, T1         nextevt  = T2
       /         \                /         \
      0           1              2           3
    idle       idle           idle         idle

0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has
a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The
expire order is T0 < T1 < T2.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:0(i), T0:1
                   /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T0(i), T1      nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

1) CPU 0 becomes active. The (i) means a now ignored timer.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T1             nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

2) CPU 0 handles remote. No timer actually expired but ignored timers
   have been cleaned out and their sibling's timers haven't been
   propagated. As a result the top level's next event is T2 and not T1.

3) CPU 0 tries to enter idle without any global timer enqueued and calls
   tmigr_quick_check(). The expiry of T2 is returned instead of the
   expiry of T1.

When the quick check returns an expiry that is too late, the cpuidle
governor may pick up a C-state that is too deep. This may be result into
undesired CPU wake up latency if the next timer is actually close enough.

Fix this with assuming that expiries aren't sorted top-down while
performing the quick check. Pick up instead the earliest encountered one
while walking up the hierarchy.

7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org

---
 kernel/time/timer_migration.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index d85aa2a..8f49b6b 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -1385,11 +1385,11 @@ u64 tmigr_cpu_deactivate(u64 nextexp)
  *			  single group active on the way to top level)
  * * nextevt		- when CPU is offline and has to handle timer on his own
  *			  or when on the way to top in every group only a single
- *			  child is active and but @nextevt is before next_expiry
- *			  of top level group
- * * next_expiry (top)	- value of top level group, when on the way to top in
- *			  every group only a single child is active and @nextevt
- *			  is after this value active child.
+ *			  child is active but @nextevt is before the lowest
+ *			  next_expiry encountered while walking up to top level.
+ * * next_expiry	- value of lowest expiry encountered while walking groups
+ *			  if only a single child is active on each and @nextevt
+ *			  is after this lowest expiry.
  */
 u64 tmigr_quick_check(u64 nextevt)
 {
@@ -1408,10 +1408,16 @@ u64 tmigr_quick_check(u64 nextevt)
 	do {
 		if (!tmigr_check_lonely(group)) {
 			return KTIME_MAX;
-		} else if (!group->parent) {
-			u64 first_global = READ_ONCE(group->next_expiry);
-
-			return min_t(u64, nextevt, first_global);
+		} else {
+			/*
+			 * Since current CPU is active, events may not be sorted
+			 * from bottom to the top because the CPU's event is ignored
+			 * up to the top and its sibling's events not propagated upwards.
+			 * Thus keep track of the lowest observed expiry.
+			 */
+			nextevt = min_t(u64, nextevt, READ_ONCE(group->next_expiry));
+			if (!group->parent)
+				return nextevt;
 		}
 		group = group->parent;
 	} while (group);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ