lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <78143cce1e94d35e6fe4e36b5e44cd2d4433c4ee.1430140376.git.viresh.kumar@linaro.org>
Date:	Mon, 27 Apr 2015 18:52:51 +0530
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	linaro-kernel@...ts.linaro.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Viresh Kumar <viresh.kumar@...aro.org>
Subject: [PATCH V2] timer: Migrate running timer to avoid waking up an idle-core

Currently, when adding new timers, we try to migrate them to a non-idle
core if the local core is idle. However, we don't do this for timers
that are re-armed from their handler. These running timers can wake up
an idle core when they could've been serviced by any non-idle core, thus
potentially impacting power savings by preventing deep idling of cores.

Migration of a running timer is race-prone in two ways:

1. Serialization of the timer with itself: If migrated, it is possible
   that timer may fire on the new base, before the timer handler has
   finished execution on the old base.

2. Deletion of timer with del_timer_sync(): del_timer_sync() deletes
   timer if the timer handler isn't running currently - this is checked
   by comparing timer against the running_timer of its base. If we
   migrate the timer to a new base, del_timer_sync() might delete it
   while its handler is still running on the old base because it will
   check against the running_timer of the new base and not find it
   running.

We can fix these problems by deferring the re-queuing of the timer until
its handler has finished.  By moving these timers to a special migration
list, we can process it after all expired timers are processed. To
optimize finding a suitable new base for timers in the migration list,
preserve the preferred_target base in __mod_timer().

Suggested-by: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Viresh Kumar <viresh.kumar@...aro.org>
---
V1->V2:
- Completely new approach and the correct one.
- Re-queue timer only after its handler has finished.

Tested on Exynos (Dual ARM Cortex-A9) board, with ubuntu.

System was fairly idle and 'dmesg > /dev/null' was run in an infinite
loop on one of the CPUs, to get it out of idle.

It was seen multiple times that migration does happen and sometimes
while processing migration-list, it is found that local CPU isn't idle
anymore. And in such cases we end up adding timer on local CPU.

 kernel/time/timer.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2ece3aa5069c..de5aa69ab958 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -78,6 +78,8 @@ struct tvec_root {
 struct tvec_base {
 	spinlock_t lock;
 	struct timer_list *running_timer;
+	struct list_head migration_list;
+	struct tvec_base *preferred_target;
 	unsigned long timer_jiffies;
 	unsigned long next_timer;
 	unsigned long active_timers;
@@ -795,10 +797,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 		 * We are trying to schedule the timer on the local CPU.
 		 * However we can't change timer's base while it is running,
 		 * otherwise del_timer_sync() can't detect that the timer's
-		 * handler yet has not finished. This also guarantees that
-		 * the timer is serialized wrt itself.
+		 * handler yet has not finished.
+		 *
+		 * Move timer to migration_list which can be processed after all
+		 * expired timers are serviced.  This also guarantees that the
+		 * timer is serialized wrt itself.
 		 */
-		if (likely(base->running_timer != timer)) {
+		if (unlikely(base->running_timer == timer)) {
+			timer->expires = expires;
+			base->preferred_target = new_base;
+			list_add_tail(&timer->entry, &base->migration_list);
+			goto out_unlock;
+		} else {
 			/* See the comment in lock_timer_base() */
 			timer_set_base(timer, NULL);
 			spin_unlock(&base->lock);
@@ -1228,6 +1238,41 @@ static inline void __run_timers(struct tvec_base *base)
 		}
 	}
 	base->running_timer = NULL;
+
+	/*
+	 * Process timers from migration list, as their handlers have finished
+	 * now.
+	 */
+	if (unlikely(!list_empty(&base->migration_list))) {
+		struct tvec_base *new_base = base->preferred_target;
+
+		if (!idle_cpu(base->cpu)) {
+			/* Local CPU isn't idle anymore */
+			new_base = base;
+		} else if (idle_cpu(new_base->cpu)) {
+			/* Re-evaluate base, target CPU has gone idle */
+			new_base = per_cpu(tvec_bases, get_nohz_timer_target(false));
+		}
+
+		do {
+			timer = list_first_entry(&base->migration_list,
+						 struct timer_list, entry);
+
+			__list_del(timer->entry.prev, timer->entry.next);
+
+			/* See the comment in lock_timer_base() */
+			timer_set_base(timer, NULL);
+			spin_unlock(&base->lock);
+
+			spin_lock(&new_base->lock);
+			timer_set_base(timer, new_base);
+			internal_add_timer(new_base, timer);
+			spin_unlock(&new_base->lock);
+
+			spin_lock(&base->lock);
+		} while (!list_empty(&base->migration_list));
+	}
+
 	spin_unlock_irq(&base->lock);
 }
 
@@ -1635,6 +1680,7 @@ static void __init init_timer_cpu(struct tvec_base *base, int cpu)
 	for (j = 0; j < TVR_SIZE; j++)
 		INIT_LIST_HEAD(base->tv1.vec + j);
 
+	INIT_LIST_HEAD(&base->migration_list);
 	base->timer_jiffies = jiffies;
 	base->next_timer = base->timer_jiffies;
 }
-- 
2.3.0.rc0.44.ga94655d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ