[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0904110931240.21577@localhost.localdomain>
Date: Sat, 11 Apr 2009 11:51:36 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Dimitri Sivanich <sivanich@....com>
cc: LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>, john stultz <johnstul@...ibm.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] Move calc_load call out from xtime_lock protection
On Fri, 10 Apr 2009, Dimitri Sivanich wrote:
> The xtime_lock is being held for long periods on larger systems due
> to an extensive amount of time being spent in calc_load(),
> specifically here:
> do_timer->update_times->calc_load->count_active_tasks->nr_active()
>
> On a 64 cpu system I've seen this take approximately 55 usec.
> Presumably it would be worse on larger systems. This causes other
> cpus to be held off in places such as
> scheduler_tick->sched_clock_tick waiting for the xtime_lock to be
> released.
I thought more about that. Why don't we move the calc_load() call into
the timer softirq context and avoid fiddling with all the call sites ?
Also moving calc_load out of the timer interrupt context reduces the
interrupts off section as well.
Thanks,
tglx
--------->
Subject: timer: move calc_load to softirq
From: Thomas Gleixner <tglx@...utronix.de>
Date: Sat, 11 Apr 2009 10:43:41 +0200
xtime_lock is held write locked across calc_load() which iterates over
all online CPUs. That can cause long latencies for xtime_lock readers
on large SMP systems.
Move the calculation to the softirq and reduce the xtime_lock write
locked section. This also reduces the interrupts off section.
Inspired by a inital patch from Dimitri Sivanich.
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
---
kernel/timer.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 2 deletions(-)
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1139,9 +1139,12 @@ static unsigned long count_active_tasks(
* Requires xtime_lock to access.
*/
unsigned long avenrun[3];
-
EXPORT_SYMBOL(avenrun);
+static atomic_t avenrun_ticks;
+static DEFINE_SPINLOCK(avenrun_lock);
+static DEFINE_PER_CPU(int, avenrun_calculate);
+
/*
* calc_load - given tick count, update the avenrun load estimates.
* This is called while holding a write_lock on xtime_lock.
@@ -1164,12 +1167,56 @@ static inline void calc_load(unsigned lo
}
/*
+ * Check whether we need to calculate load.
+ */
+static void check_calc_load(void)
+{
+ int ticks, *calc = &__get_cpu_var(avenrun_calculate);
+
+ /*
+ * The trigger is set in the timer interrupt when this CPU
+ * called do_timer(). We handle this sloppy w/o disabling
+ * interrupts. If the trigger is set after we cleared it we
+ * might look at a stale trigger in the next cycle, but then
+ * we check anyway whether avenrun_ticks is > 0. Normally the
+ * do_timer() call is bound to a particular CPU except for the
+ * NOHZ case where a CPU going into a long idle sleep drops
+ * the do_timer() duty. In the case that another timer
+ * interrupt happens right after we return from this function,
+ * then we run the calculation in the next cycle or in the
+ * nohz case if we give up the do_timer() duty then the next
+ * CPU which calls do_timer() will take care of the
+ * unaccounted ticks. calc_load is not a precise accounting so
+ * having some lag is not hurting.
+ */
+ if (!*calc)
+ return;
+
+ *calc = 0;
+
+ while (atomic_read(&avenrun_ticks)) {
+ /*
+ * avenrun_lock serializes the decrement of
+ * avenrun_ticks and the avenrun calculation.
+ */
+ spin_lock(&avenrun_lock);
+ ticks = atomic_read(&avenrun_ticks);
+ if (ticks) {
+ atomic_sub(ticks, &avenrun_ticks);
+ calc_load(ticks);
+ }
+ spin_unlock(&avenrun_lock);
+ }
+}
+
+/*
* This function runs timers and the timer-tq in bottom half context.
*/
static void run_timer_softirq(struct softirq_action *h)
{
struct tvec_base *base = __get_cpu_var(tvec_bases);
+ check_calc_load();
hrtimer_run_pending();
if (time_after_eq(jiffies, base->timer_jiffies))
@@ -1193,7 +1240,9 @@ void run_local_timers(void)
static inline void update_times(unsigned long ticks)
{
update_wall_time();
- calc_load(ticks);
+
+ atomic_add(ticks, &avenrun_ticks);
+ __get_cpu_var(avenrun_calculate) = 1;
}
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists